Re: Japanese transformation is not stable

Hiroaki KAWAI 27 Jul 2004 20:04:15 -0000

> > Hmm. It still happens, that different JREs (?) produce different iso-2022-jp
> > output (i.e. any time someone builds all and diffs, he gets .ja.jis diffs.
> 
> Well, at least mine removes bogus escape sequences and
> produce more desirable output but yeah, it still happens.


Last few months, I've encountered some bugs of the implementation of 
iso-2022-jp charset converter of Sun JRE, but the converter will be soon 
stable I think. 
I'm working on the input XML files, and I'm not watching the generated 
html files. I feel the diffs of the htmls are not so important than 
those of the xmls. 
I can just ignore the diffs of generated html files right now.

Well, I don't understand what the diffs do harm to us, so can I ask some 
reasons?


> > I'd suggest to switch the transformation finally to shift_jis, which is more
> > stable (because there are none of these problematic escape sequences).
> 
> I'd rather use euc-jp than shift_jis.  For one thing,
> shift_jis is a nightmare for auto detection since almost all
> byte sequence can represent a valid character.  If I choose
> from three major character encoding scheme in Japan, I
> always choose euc-jp.  It doesn't have quirks sjis has.  The
> fact that current one uses iso-2022-jp is just from legacy
> reasons.

IMHO, whatever charset we choose, more or less, we will face this kind of 
problem. 
# I, myself prefer UTF8. :-)
## Because it support wide area of characters. 

But, shift_jis is actually worse choise because there're well known 
issuses around Shift_JIS and CP932 charsets. 
The alias definition changed and changed between the release of Java.

I have no strong push which charset to be (except shift_jis).


---Hiroaki Kawai


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Japanese transformation is not stable

Reply via email to