The problem is that, even if the PI specifies UTF-8 for example, the file itself may be saved with a different encoding (so they may not match).
On Nov 14, 2010, at 8:00 PM, Niclas Hedhman wrote: > On Mon, Nov 15, 2010 at 8:52 AM, Greg Brown <[email protected]> wrote: >>> Doesn't the XML deserializer you use just work correctly if you pass >>> an InputStream instead of a Reader?? >> >> >> Actually, I think a Reader would work but we don't currently expose that >> API. We use javax.xml.stream.XMLInputFactory#createXMLStreamReader() to >> process the XML, which takes an InputStream as an argument. What we should >> probably do is allow the caller to specify the character set to read (there >> is another version of createXMLStreamReader() that takes both an InputStream >> and a java.nio.charset.Charset). > > That is incorrect. XML specification says that the <?xml> processing > instruction is in (IIRC) ASCII and it contains the encoding of the > rest of the document., such as <?xml version="1.0" encoding="UTF-8" > ?>, and compliant parsers should understand this. So, for instance, if > the document is in UTF-16, the <?xml?> PI is NOT, and a regular text > editor would have problem with handling that. For UTF-8, ISO-8859-X > and others, the ASCII encoding coincide so not so obvious. > > Cheers > -- > Niclas Hedhman, Software Developer > http://www.qi4j.org - New Energy for Java > > I live here; http://tinyurl.com/2qq9er > I work here; http://tinyurl.com/2ymelc > I relax here; http://tinyurl.com/2cgsug
