ID:               33793
 Updated by:       [EMAIL PROTECTED]
 Reported By:      lars dot jensen at careercross dot com
-Status:           Assigned
+Status:           Bogus
 Bug Type:         mbstring related
 Operating System: FreeBSD 5.3
 PHP Version:      5.1.0-dev
 Assigned To:      moriyoshi
 New Comment:

This is not a bug.

The character that corresponds to the codepoint U#3231 is not contained
by the coded character set called "Shift_JIS".

Instead, another similar coded character set "CP932", which is often
mistaken for "Shift_JIS" because of some historical reasons, contains
that character.

Probably your problem is solved by simply specifying "CP932" in every
place of "SJIS" .





Previous Comments:
------------------------------------------------------------------------

[2005-07-22 08:04:03] lars dot jensen at careercross dot com

Upgraded to PHP Version 5.1.0-dev via the posted link, but experience
the exact same result as before.

I dont have access to a Windows server, so I havent tested with the
Win32 version.

------------------------------------------------------------------------

[2005-07-21 12:47:29] [EMAIL PROTECTED]

Please try using this CVS snapshot:

  http://snaps.php.net/php5-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php5-win32-latest.zip



------------------------------------------------------------------------

[2005-07-21 10:19:24] [EMAIL PROTECTED]

Assigned to the maintainer.

------------------------------------------------------------------------

[2005-07-21 03:19:47] lars dot jensen at careercross dot com

Description:
------------
Writing a class to handle conversion of UTF-8 input into SJIS using
usual 
$body = mb_convert_encoding($body, "SJIS", "UTF-8");
function, usually works, but so far by testing, I identified three
kanji's which makes this function fail to convert these correctly and
thus causing mojibake

The three UTF-8 characters is identified as follows 
chr(227).chr(136).chr(177)
chr(227).chr(136).chr(178)
chr(227).chr(136).chr(185)

which in SJIS corresponds to
chr(135).chr(138)
chr(135).chr(139)
chr(135).chr(140)

Reproduce code:
---------------
I created a "quick'n'dirty" solution as follows, which surely isnt
optimal

$body = str_replace(chr(227).chr(136).chr(177), '#mojihack1#', $body);
$body = str_replace(chr(227).chr(136).chr(178), '#mojihack2#', $body);
$body = str_replace(chr(227).chr(136).chr(185), '#mojihack3#', $body);

$body = mb_convert_encoding($body, "SJIS", "UTF-8");

$body = str_replace('#mojihack1#', chr(135).chr(138), $body);
$body = str_replace('#mojihack2#', chr(135).chr(139), $body);
$body = str_replace('#mojihack3#', chr(135).chr(140), $body);




------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=33793&edit=1

Reply via email to