On Tue, Dec 8, 2009 at 7:05 PM, Jonathan Rockway <[email protected]> wrote:
> * On Tue, Dec 08 2009, Bill Moseley wrote: > > On Tue, Dec 8, 2009 at 12:26 AM, Aristotle Pagaltzis <[email protected]> > wrote: > > > > There is no such thing as an octet stream in Perl. There are only > > strings, and strings are sequences of arbitrarily large integers. > > > > Help me out here. > > > > What I've stuck in my mind is that the poorly-named utf8 flag on Perl > strings is really > > the "is_character_data" flag. To get get character data it *must* be > decoded on > > input, and the act of decoding sets that flag. Even decoding 8 bit > character encoding > > will set the flag. > > Sorry, it doesn't mean that. latin1 text is character data, but won't > have the UTF8 flag on. $ perl -MEncode -wle '$x=Encode::decode("Latin1", "hello"); print Encode::is_utf8( $x ) ? "flag set\n" : "no flag\n";' flag set The UTF8 flag doesn't mean anything more than > any of the other SV flags. But the flag on indicates the the string was decoded. And that implies that it needs to be encoded. And if I don't know what encoding to use then it's time to throw an exception. That's why it seems like the Engine should throw an exception if the utf8 flag is set when it's time to get the length. Because the encoding is not known so it's impossible to know the encoded byte length. -- Bill Moseley [email protected]
_______________________________________________ List: [email protected] Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst Searchable archive: http://www.mail-archive.com/[email protected]/ Dev site: http://dev.catalyst.perl.org/
