Either way, you're probably right that entering it as a bug makes sense. That way we can track it and investigate further. G
On Nov 14, 2010, at 8:03 PM, Greg Brown wrote: > The problem is that, even if the PI specifies UTF-8 for example, the file > itself may be saved with a different encoding (so they may not match). > > On Nov 14, 2010, at 8:00 PM, Niclas Hedhman wrote: > >> On Mon, Nov 15, 2010 at 8:52 AM, Greg Brown <[email protected]> wrote: >>>> Doesn't the XML deserializer you use just work correctly if you pass >>>> an InputStream instead of a Reader?? >>> >>> >>> Actually, I think a Reader would work but we don't currently expose that >>> API. We use javax.xml.stream.XMLInputFactory#createXMLStreamReader() to >>> process the XML, which takes an InputStream as an argument. What we should >>> probably do is allow the caller to specify the character set to read (there >>> is another version of createXMLStreamReader() that takes both an >>> InputStream and a java.nio.charset.Charset). >> >> That is incorrect. XML specification says that the <?xml> processing >> instruction is in (IIRC) ASCII and it contains the encoding of the >> rest of the document., such as <?xml version="1.0" encoding="UTF-8" >> ?>, and compliant parsers should understand this. So, for instance, if >> the document is in UTF-16, the <?xml?> PI is NOT, and a regular text >> editor would have problem with handling that. For UTF-8, ISO-8859-X >> and others, the ASCII encoding coincide so not so obvious. >> >> Cheers >> -- >> Niclas Hedhman, Software Developer >> http://www.qi4j.org - New Energy for Java >> >> I live here; http://tinyurl.com/2qq9er >> I work here; http://tinyurl.com/2ymelc >> I relax here; http://tinyurl.com/2cgsug >
