Jaco de Groot <[EMAIL PROTECTED]> wrote: > > The character are passed on to the following classes: > > Reader -> BufferedReader -> StringBuffer -> String -> StringReader -> > XMLReader > > I think it is best to leave the character conversion (converting 2 > characters to one character) to the XMLReader. Using ISO-8859-1 will make > sure the first 5 classes will do no convertions. I think the XMLReader will > use UTF-8 by default if no encoding is specified. We have been testing with > an xml file that has UTF-8 specified and it does the conversion like it > should.
I think character conversion should be completely left to the xml-parser. AFAIK you can get Strings out of it. If XMLReader is now fed with a Reader rather then with a InputStream, then that is an error I think, because xml's are byte-arrays, not strings. > If the FileInputStream is set to UTF-8 it seems to work also. In this case > the XMLReader will propably see that the special characters it reads have a > value higher than 256 and will do no conversion. But I think in special > cases it can go wrong. If the FileInputStream converts 2 characters to 1 > character with a value between 127 and 256 the XMLReader will do another > conversion and that will give wrong results. But I haven't tested this > situation. UTF-8 is not 2 bytes per letter, but 1 or more bytes per letter (I think up to 4 or 5 bytes). > The change I made fixed the problem we encountered one a machine that had > file.encoding set to ASCII. This means that if you use FileReader to read I can understand that you changed something, but I only wonder if the fix was done on the right spot. If XMLBasicReader is reading files as if it is a String that that should be changed. All xml-files should be written as UTF-8 af far as I'm concerned. Just asking because I saw this 'iso-8859-1' in an intrinsicly (at least default) UTF-8 medium as XML, I did not explore the matter more then that. Michiel -- Michiel Meeuwissen Mediapark C101 Hilversum +31 (0)35 6772979 nl_NL eo_XX en_US mihxil' [] ()
