On Mon, Mar 12, 2012 at 3:52 AM, Stas Malyshev <smalys...@sugarcrm.com>wrote:
> Hi! > > > Ignoring 5.4 for a second, if you in 5.3 do this: >> >> echo htmlspecialchars($string); >> echo htmlspecialchars($string, NULL, "ISO-8859-1"); >> echo htmlspecialchars($string, NULL, "UTF-8"); >> >> You will see that the first two output the escaped string with the >> GB2312 bytes intact within it and the UTF-8 calls returns false because >> it correctly recognizes that GB2312 is not UTF-8. We don't have any such >> check for 8859-1, so yes, saying UTF-8 and 8859-1 are the same for >> htmlspecialchars() is wrong for PHP 5.3 as well as for 5.4. >> > > So the difference is that ISO8859-1 does not validate but UTF-8 validates? > I'm not sure what GB2312 encoding does but isn't it dangerous to do > htmlspecialchars() with wrong encoding? Wouldn't htmlentities() also > produce wrong result when used with wrong encoding? The EUC-CN encoding appears to ensure compatibility with ascii by avoiding the ascii range for each of its two bytes, so it seems that htmlspecialchars should work OK: http://en.wikipedia.org/wiki/GB_2312#EUC-CN http://php.net/manual/en/mbstring.supported-encodings.php Adam Adam