Michiel Meeuwissen wrote:
I think character conversion should be completely left to the xml-parser.

I agree.


AFAIK you can get Strings out of it. If XMLReader is now fed with a Reader
rather then with a InputStream, then that is an error I think, because xml's
are byte-arrays, not strings.

Are you saying that the Reader should be passed directly to the XML parser? I think so too. It is strange that the file is passed on to so many classes. I didn't feel like rewriting all of this. In the old situation the result of the import was dependent on the file.encoding setting of the virtual machine. This will give "random" results. While testing on 4 machines we found 4 different settings. With one machine having ASCII it went wrong. That's why it thought it was wise to at least remove this random behaviour. Setting the default to ISO-8859-1 seemed the safest to me (no "conversion" taking place before the xml reader reads the files).


UTF-8 is not 2 bytes per letter, but 1 or more bytes per letter (I
think up to 4 or 5 bytes).

I know. It is difficult to explain by mail how the current process works and how I think it can go wrong when setting it to UTF-8.


Jaco




Reply via email to