ID: 33793 Updated by: [EMAIL PROTECTED] Reported By: lars dot jensen at careercross dot com -Status: Assigned +Status: Bogus Bug Type: mbstring related Operating System: FreeBSD 5.3 PHP Version: 5.1.0-dev Assigned To: moriyoshi New Comment:
This is not a bug. The character that corresponds to the codepoint U#3231 is not contained by the coded character set called "Shift_JIS". Instead, another similar coded character set "CP932", which is often mistaken for "Shift_JIS" because of some historical reasons, contains that character. Probably your problem is solved by simply specifying "CP932" in every place of "SJIS" . Previous Comments: ------------------------------------------------------------------------ [2005-07-22 08:04:03] lars dot jensen at careercross dot com Upgraded to PHP Version 5.1.0-dev via the posted link, but experience the exact same result as before. I dont have access to a Windows server, so I havent tested with the Win32 version. ------------------------------------------------------------------------ [2005-07-21 12:47:29] [EMAIL PROTECTED] Please try using this CVS snapshot: http://snaps.php.net/php5-latest.tar.gz For Windows: http://snaps.php.net/win32/php5-win32-latest.zip ------------------------------------------------------------------------ [2005-07-21 10:19:24] [EMAIL PROTECTED] Assigned to the maintainer. ------------------------------------------------------------------------ [2005-07-21 03:19:47] lars dot jensen at careercross dot com Description: ------------ Writing a class to handle conversion of UTF-8 input into SJIS using usual $body = mb_convert_encoding($body, "SJIS", "UTF-8"); function, usually works, but so far by testing, I identified three kanji's which makes this function fail to convert these correctly and thus causing mojibake The three UTF-8 characters is identified as follows chr(227).chr(136).chr(177) chr(227).chr(136).chr(178) chr(227).chr(136).chr(185) which in SJIS corresponds to chr(135).chr(138) chr(135).chr(139) chr(135).chr(140) Reproduce code: --------------- I created a "quick'n'dirty" solution as follows, which surely isnt optimal $body = str_replace(chr(227).chr(136).chr(177), '#mojihack1#', $body); $body = str_replace(chr(227).chr(136).chr(178), '#mojihack2#', $body); $body = str_replace(chr(227).chr(136).chr(185), '#mojihack3#', $body); $body = mb_convert_encoding($body, "SJIS", "UTF-8"); $body = str_replace('#mojihack1#', chr(135).chr(138), $body); $body = str_replace('#mojihack2#', chr(135).chr(139), $body); $body = str_replace('#mojihack3#', chr(135).chr(140), $body); ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=33793&edit=1