Re: Shift_Jis for generated japanese output?

Yoshiki Hayashi 23 Mar 2004 07:01:50 -0000

AndrÃ Malo <[EMAIL PROTECTED]> writes:

>> Not really.  Iso-2022-jp is the most auto-detection friendly
>> encoding because of infamous escape sequence but shift_jis
>> would be OK, too.  I'm +-0 on conversion at the moment
>> because it hasn't caused me much trouble so far.
>
> Well, then we should stick with it.
>
> I find it just annoying that the recoding is not stable (i.e. the xalan
> serializer output differs from version to version and depending on other
> things like moon phases or so :).


From my experience, it was Java version that mattered.
After I upgraded to JDK 1.4, I don't have problem with
encoding.  The reason this happens is that iso-2022-jp is a
stateful encoding.  After an escape sequence, following
bytes are interpreted in certain state.  After one escape
sequence, bytes are interpreted as ASCII character and after
another, those are are interpreted as some Japanese
characters.  Because of this, you can have bogus escape
sequences like switching to another state and then
immediately going back to previous state.  There were lots
of these sequences when I was using JDK 1.2.  I'm hoping
this won't happen anymore since all files are re-encoded
with newer JDK.  If this happens again, I would give +1 for
changing the generated files to shift_jis.

> (though... actually I don't know if shift_jis would make the things better)

Yes, shift_jis would make it easier because it's a plain
8bit character encoding scheme.  There is only one way to
encode a character.

-- 
Yoshiki Hayashi

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Shift_Jis for generated japanese output?

Reply via email to