On May 19, 2007, at 11:13 PM, Tomas Kuliavas wrote:
0xC4 and 0x85 are hex codes for latin small letter a with ogonek in
utf-8. ą
<?php
var_dump("ą" == "\xC4\x85");
echo "ą\n";
echo "\xC4\x85";
?>
If script is written in utf-8, I expect bool(true) on var_dump() line.
var_dump("ą" == b"\xC4\x85");
This will give you what you want, if the script is written in UTF-8
and your runtime encoding is set to UTF-8.
<?php
// example uses utf-8. similar code is used in iso-8859-2 -
// iso-8859-16 decoding. utf-8 decoding does not need mapping tables
// and is written in pcre.
$s1 = "ą";
$s2 = "\xC4\x85";
echo str_replace($s2,'ą',$s1);
?>
Expected result: ą
Got: ą
test setup (php6.0-200705190630) uses trimmed php.ini with only
unicode.semantics=on setting
unicode.fallback_encoding - no value
unicode.filesystem_encoding - no value
unicode.http_input_encoding - no value
unicode.output_encoding - no value
unicode.runtime_encoding - no value
unicode.script_encoding - no value
unicode.semantics - On
unicode.stream_encoding - UTF-8
Why didn't you set any encoding settings?
-Andrei
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php