On Thu, Jan 10, 2008 at 17:29:42 +0100, Peter J. Holzer wrote: > The same byte sequence, but not the same value. In C (on many systems) > the single precision floating point number 3.1415927 and the integer > 1078530011 have the same byte sequence (0xdb 0xf 0x49 0x40 on little > endian systems), but they hardly have the same value.
OK, I've got your point, though it's more a question of a terminology. Let me put it another way: my opinion is that C::M (and C::M::F) itself should not save/restore UTF-8 flag. Instead, it should work the same way other Perl data streams work. If you write a string to a file, no magic flags are stored somewhere. Instead, when you _read_ it back you say, "alright, please set an UTF-8 flag on the data if it looks like UTF-8 string". DBI works the same way (yes, DBD backends actually, thanks for pointing that, but this doesn't make much difference). Actually, it's possible to store this flag in memcached, and _when asked_ to set UTF-8 back, no string scan would be necessary to see if the string is really in UTF-8. However, I think such optimization is not worth the risk of missing some UTF-8 data that was uploaded though some other memcached client that doesn't set any special flag, or of setting UTF-8 flag on the string that was messed with append/prepend. You correctly pointed that this flag is part of Perl's internals, so it's better not to set it without additional precautions. Of course, if the person responsible for C::M would accept Tatsuki-san's patch, I'll reluctantly add the same functionality in C::M::F. So let's see how it would go ;). -- Tomash Brechko
