>> > We've discussed this a few times in the past and it's time to make a >> > final decision about its removal. >> > >> > I think most people have agreed that this is the way forward but no >> > one has produced a patch. I have a student working on unicode >> > conversion for the Google Summer of Code and this would help make it >> > simpler. >> >> unicode_semantics=on breaks backwards compatibility in scripts that have >> implemented multiple character set support in current PHP setups. > > Why don't you go ahead and make a list of those exacty issues then? We > can then see how to fix those issues. That's much more useful then just > posting to the mailinglist when you don't agree with something. From > what I've seen with my code base, the changes that I have to do are > minimal once some (internal) functions are fixed up.
If I remain silent, others will have arguments that "everybody agrees on removal of unicode_semantics". I write and maintain charset decoding and encoding functions. unicode_semantics breaks every mapping table and other functions that operate with binary 8bit strings. In slides by Andrei Zmievski Unicode symbols are written with \u. Why are they written with \x(hex) and \(octal) in current PHP6? --- <?php echo "\xC3\200"; --- I am not writing U+00C3 and U+0080, I am writing U+00C0 in UTF-8. --- <?php $string = "ą"; var_dump(preg_replace("/([\300-\337])([\200-\277])/e", "'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'", $string)); for ($i=0;$i<strlen($string);$i++) { $char = ord($string[$i]); echo sprintf("=%02X",$char); } --- string(6) "ą" and '=C4=85' expected, if "ą" is written in UTF-8. I can bypass it by adding one line to every script that operates with binary strings, but where are warranties that you won't dump declare() support just like you dump unicode_semantics. What happens to your new Unicode aware string functions, if I lie about strings' charset to PHP interpreter? mb_strlen can't calculate correct $string length even when I set correct charset in mb_strlen() arguments. If above code works as I want in PHP6 unicode_semantics=on, mb_strlen($string,'utf-8') returns 2 and not 1. -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php