David Zülke wrote:
Interesting. I assume that was a weakness in the respective
implementation, right? Since
0xE0 " >
should never be regarded a valid sequence since neither " nor > are in
the range above 0x7F...
But that's what we are talking about. What to do with invalid
sequences. The E0 says that the following 2 bytes are part of the UTF-8
character. So this is a 3-byte sequence. Together these 3 bytes are
not valid, so Microsoft chose to replace those 3 with some other
character. And yes, Microsoft is notoriously bad at reading specs, but
I don't think it is completely clear what to do here, but we do know
that we shouldn't do that.
-Rasmus
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php