jsp startimport.jsp

Jaco de Groot Wed, 10 Dec 2003 02:00:30 -0800

Michiel Meeuwissen wrote:

I think ISO-8859-1 is wrong, because XML's should be encoded with UTF-8 (on
default, which only can be overriden with the <?xml header >).

The character are passed on to the following classes:

Reader -> BufferedReader -> StringBuffer -> String -> StringReader -> XMLReader

I think it is best to leave the character conversion (converting 2 characters to one character) to the XMLReader. Using ISO-8859-1 will make sure the first 5 classes will do no convertions. I think the XMLReader will use UTF-8 by default if no encoding is specified. We have been testing with an xml file that has UTF-8 specified and it does the conversion like it should.

If the FileInputStream is set to UTF-8 it seems to work also. In this case the XMLReader will propably see that the special characters it reads have a value higher than 256 and will do no conversion. But I think in special cases it can go wrong. If the FileInputStream converts 2 characters to 1 character with a value between 127 and 256 the XMLReader will do another conversion and that will give wrong results. But I haven't tested this situation.

The change I made fixed the problem we encountered one a machine that had file.encoding set to ASCII. This means that if you use FileReader to read the file it will break all non-ASCII characters and will only put character with values from 1 to 128 in the BufferedReader. Other file.encoding setting also seemed to work (UTF-8, ISO-88591, CP125?) in the old situation.

Jaco

Re: [MMBASE CVS] html/mmapps/xmlimporter/jsp startimport.jsp

Reply via email to