> > Hmm. It still happens, that different JREs (?) produce different iso-2022-jp > > output (i.e. any time someone builds all and diffs, he gets .ja.jis diffs. > > Well, at least mine removes bogus escape sequences and > produce more desirable output but yeah, it still happens.
Last few months, I've encountered some bugs of the implementation of iso-2022-jp charset converter of Sun JRE, but the converter will be soon stable I think. I'm working on the input XML files, and I'm not watching the generated html files. I feel the diffs of the htmls are not so important than those of the xmls. I can just ignore the diffs of generated html files right now. Well, I don't understand what the diffs do harm to us, so can I ask some reasons? > > I'd suggest to switch the transformation finally to shift_jis, which is more > > stable (because there are none of these problematic escape sequences). > > I'd rather use euc-jp than shift_jis. For one thing, > shift_jis is a nightmare for auto detection since almost all > byte sequence can represent a valid character. If I choose > from three major character encoding scheme in Japan, I > always choose euc-jp. It doesn't have quirks sjis has. The > fact that current one uses iso-2022-jp is just from legacy > reasons. IMHO, whatever charset we choose, more or less, we will face this kind of problem. # I, myself prefer UTF8. :-) ## Because it support wide area of characters. But, shift_jis is actually worse choise because there're well known issuses around Shift_JIS and CP932 charsets. The alias definition changed and changed between the release of Java. I have no strong push which charset to be (except shift_jis). ---Hiroaki Kawai --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]