* On Tue, Dec 08 2009, Bill Moseley wrote: > On Tue, Dec 8, 2009 at 12:26 AM, Aristotle Pagaltzis <[email protected]> wrote: > > There is no such thing as an octet stream in Perl. There are only > strings, and strings are sequences of arbitrarily large integers. > > Help me out here. > > What I've stuck in my mind is that the poorly-named utf8 flag on Perl strings > is really > the "is_character_data" flag. To get get character data it *must* be > decoded on > input, and the act of decoding sets that flag. Even decoding 8 bit character > encoding > will set the flag.
Sorry, it doesn't mean that. latin1 text is character data, but won't have the UTF8 flag on. The UTF8 flag doesn't mean anything more than any of the other SV flags. All of these flags are basically performance hacks and should be considered totally off-limits to user code. They have absolutely no meaning there. > And any strings with the flag set *must* be encoded before printing (sending > out of > Perl) -- otherwise you are printing abstract "characters" that have no > meaning outside > of Perl. Any string without the flag set must also be encoded. If text ever enters your application, it must do so through a call to decode. If text ever leaves your application, it must do so through a call to encode. Your application must always, without exception, decode and encode all text data. It's confusing because this is sometimes done automatically by libraries that are in use. It's confusing because sometimes it's *not* done by the libraries that are in use :) If you're not sure if your library is doing this for you, read the source, or ask someone :) Regards, Jonathan Rockway -- print just => another => perl => hacker => if $,=$" _______________________________________________ List: [email protected] Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/[email protected]/ Dev site: http://dev.catalyst.perl.org/
