On May 19, 2007, at 11:13 PM, Tomas Kuliavas wrote:

0xC4 and 0x85 are hex codes for latin small letter a with ogonek in utf-8. ą

<?php
var_dump("ą" == "\xC4\x85");
echo "ą\n";
echo "\xC4\x85";
?>

If script is written in utf-8, I expect bool(true) on var_dump() line.

var_dump("ą" == b"\xC4\x85");

This will give you what you want, if the script is written in UTF-8 and your runtime encoding is set to UTF-8.

<?php
// example uses utf-8. similar code is used in iso-8859-2 -
// iso-8859-16 decoding. utf-8 decoding does not need mapping tables
// and is written in pcre.
$s1 = "ą";
$s2 = "\xC4\x85";
echo str_replace($s2,'&#261;',$s1);
?>

Expected result: &#261;
Got: ą

test setup (php6.0-200705190630) uses trimmed php.ini with only
unicode.semantics=on setting

unicode.fallback_encoding - no value
unicode.filesystem_encoding - no value
unicode.http_input_encoding - no value
unicode.output_encoding - no value
unicode.runtime_encoding - no value
unicode.script_encoding - no value
unicode.semantics - On
unicode.stream_encoding - UTF-8

Why didn't you set any encoding settings?

-Andrei
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to