On Mon, Mar 12, 2012 at 3:52 AM, Stas Malyshev <smalys...@sugarcrm.com>wrote:

> Hi!
>
>
>  Ignoring 5.4 for a second, if you in 5.3 do this:
>>
>> echo htmlspecialchars($string);
>> echo htmlspecialchars($string, NULL, "ISO-8859-1");
>> echo htmlspecialchars($string, NULL, "UTF-8");
>>
>> You will see that the first two output the escaped string with the
>> GB2312 bytes intact within it and the UTF-8 calls returns false because
>> it correctly recognizes that GB2312 is not UTF-8. We don't have any such
>> check for 8859-1, so yes, saying UTF-8 and 8859-1 are the same for
>> htmlspecialchars() is wrong for PHP 5.3 as well as for 5.4.
>>
>
> So the difference is that ISO8859-1 does not validate but UTF-8 validates?
> I'm not sure what GB2312 encoding does but isn't it dangerous to do
> htmlspecialchars() with wrong encoding? Wouldn't htmlentities() also
> produce wrong result when used with wrong encoding?


The EUC-CN encoding appears to ensure compatibility with ascii by avoiding
the ascii range for each of its two bytes, so it seems that
htmlspecialchars should work OK:

http://en.wikipedia.org/wiki/GB_2312#EUC-CN
http://php.net/manual/en/mbstring.supported-encodings.php

Adam

Adam

Reply via email to