David Zülke wrote:
Interesting. I assume that was a weakness in the respective
implementation, right? Since

0xE0 " >

should never be regarded a valid sequence since neither " nor > are in
the range above 0x7F...

But that's what we are talking about. What to do with invalid sequences. The E0 says that the following 2 bytes are part of the UTF-8 character. So this is a 3-byte sequence. Together these 3 bytes are not valid, so Microsoft chose to replace those 3 with some other character. And yes, Microsoft is notoriously bad at reading specs, but I don't think it is completely clear what to do here, but we do know that we shouldn't do that.

-Rasmus

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to