Jaco de Groot <[EMAIL PROTECTED]> wrote:
> > The character are passed on to the following classes:
> 
> Reader -> BufferedReader -> StringBuffer -> String -> StringReader -> 
> XMLReader
> 
> I think it is best to leave the character conversion (converting 2 
> characters to one character) to the XMLReader. Using ISO-8859-1 will make 
> sure the first 5 classes will do no convertions. I think the XMLReader will 
> use UTF-8 by default if no encoding is specified. We have been testing with 
> an xml file that has UTF-8 specified and it does the conversion like it 
> should.

I think character conversion should be completely left to the xml-parser.
AFAIK you can get Strings out of it. If XMLReader is now fed with a Reader
rather then with a InputStream, then that is an error I think, because xml's
are byte-arrays, not strings.


> If the FileInputStream is set to UTF-8 it seems to work also. In this case 
> the XMLReader will propably see that the special characters it reads have a 
> value higher than 256 and will do no conversion. But I think in special 
> cases it can go wrong. If the FileInputStream converts 2 characters to 1 
> character with a value between 127 and 256 the XMLReader will do another 
> conversion and that will give wrong results. But I haven't tested this 
> situation.

UTF-8 is not 2 bytes per letter, but 1 or more bytes per letter (I
think up to 4 or 5 bytes).

> The change I made fixed the problem we encountered one a machine that had 
> file.encoding set to ASCII. This means that if you use FileReader to read 

I can understand that you changed something, but I only wonder if the fix
was done on the right spot. If XMLBasicReader is reading files as if it is a
String that that should be changed. All xml-files should be written as UTF-8
af far as I'm concerned. Just asking because I saw this 'iso-8859-1' in an
intrinsicly (at least default) UTF-8 medium as XML, I did not explore the
matter more then that.

Michiel



-- 
Michiel Meeuwissen 
Mediapark C101 Hilversum  
+31 (0)35 6772979
nl_NL eo_XX en_US
mihxil'
 [] ()

Reply via email to