On 03/12/2012 12:10 AM, Stas Malyshev wrote:
> Hi!
> 
>> What we really need is what we added in PHP 6. A runtime encoding ini
>> setting that is distinct from the output charset which we can use here.
>> That would allow people to fix all their legacy code to a specific
>> runtime encoding with a single ini setting instead of changing thousands
>> of lines of code. I propose that we add such a directive to 5.4.1 to
>> ease migration.
> 
> One more charset INI setting? I'm not sure I like this. We have tons of
> INIs already, and adding a new one each time we change something makes
> both writing applications and configuring servers harder.
> But as the manual says, ISO-8859-1 and  UTF-8  are the same for
> htmlspecialchars() - is it wrong? If yes, what exactly is the different
> between old and new behavior? I tried to read #61354 but could make
> little sense out of it, it lacks expected result and I have hard time
> understanding what is the problem there. Could you explain?

Yes, it is a bit hard to understand from the bug report because
bugs.php.net is all utf-8, but we are talking about non utf-8 apps here.

This script should illustrate it: ( https://gist.github.com/2020502 )

$gb2312 = iconv('UTF-8','GB2312','我是测试');
$string = $string = "<pre><p>$gb2312</p></pre>";
echo htmlspecialchars($string);

If you run that in PHP 5.3 you get:

&lt;pre&gt;&lt;p&gt;���Dz���&lt;/p&gt;&lt;/pre&gt;

The garbage-like chars there - if you don't see them, see
https://gist.github.com/2020442 - is the expected output. In PHP 5.4 the
output is nothing. The function recognizes that this is not valid UTF-8
and dumps the entire string.

Ignoring 5.4 for a second, if you in 5.3 do this:

echo htmlspecialchars($string);
echo htmlspecialchars($string, NULL, "ISO-8859-1");
echo htmlspecialchars($string, NULL, "UTF-8");

You will see that the first two output the escaped string with the
GB2312 bytes intact within it and the UTF-8 calls returns false because
it correctly recognizes that GB2312 is not UTF-8. We don't have any such
check for 8859-1, so yes, saying UTF-8 and 8859-1 are the same for
htmlspecialchars() is wrong for PHP 5.3 as well as for 5.4.

And as expected, under 5.4 because the default is now the UTF-8
behaviour only the second echo gives a result.

-Rasmus

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to