On 03/12/2012 12:10 AM, Stas Malyshev wrote: > Hi! > >> What we really need is what we added in PHP 6. A runtime encoding ini >> setting that is distinct from the output charset which we can use here. >> That would allow people to fix all their legacy code to a specific >> runtime encoding with a single ini setting instead of changing thousands >> of lines of code. I propose that we add such a directive to 5.4.1 to >> ease migration. > > One more charset INI setting? I'm not sure I like this. We have tons of > INIs already, and adding a new one each time we change something makes > both writing applications and configuring servers harder. > But as the manual says, ISO-8859-1 and UTF-8 are the same for > htmlspecialchars() - is it wrong? If yes, what exactly is the different > between old and new behavior? I tried to read #61354 but could make > little sense out of it, it lacks expected result and I have hard time > understanding what is the problem there. Could you explain?
Yes, it is a bit hard to understand from the bug report because bugs.php.net is all utf-8, but we are talking about non utf-8 apps here. This script should illustrate it: ( https://gist.github.com/2020502 ) $gb2312 = iconv('UTF-8','GB2312','我是测试'); $string = $string = "<pre><p>$gb2312</p></pre>"; echo htmlspecialchars($string); If you run that in PHP 5.3 you get: <pre><p>���Dz���</p></pre> The garbage-like chars there - if you don't see them, see https://gist.github.com/2020442 - is the expected output. In PHP 5.4 the output is nothing. The function recognizes that this is not valid UTF-8 and dumps the entire string. Ignoring 5.4 for a second, if you in 5.3 do this: echo htmlspecialchars($string); echo htmlspecialchars($string, NULL, "ISO-8859-1"); echo htmlspecialchars($string, NULL, "UTF-8"); You will see that the first two output the escaped string with the GB2312 bytes intact within it and the UTF-8 calls returns false because it correctly recognizes that GB2312 is not UTF-8. We don't have any such check for 8859-1, so yes, saying UTF-8 and 8859-1 are the same for htmlspecialchars() is wrong for PHP 5.3 as well as for 5.4. And as expected, under 5.4 because the default is now the UTF-8 behaviour only the second echo gives a result. -Rasmus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php