Re: Character '€'

Greg Brown Sun, 14 Nov 2010 17:03:47 -0800

The problem is that, even if the PI specifies UTF-8 for example, the file 
itself may be saved with a different encoding (so they may not match).


On Nov 14, 2010, at 8:00 PM, Niclas Hedhman wrote:

> On Mon, Nov 15, 2010 at 8:52 AM, Greg Brown <[email protected]> wrote:
>>> Doesn't the XML deserializer you use just work correctly if you pass
>>> an InputStream instead of a Reader??
>> 
>> 
>> Actually, I think a Reader would work but we don't currently expose that 
>> API. We use javax.xml.stream.XMLInputFactory#createXMLStreamReader() to 
>> process the XML, which takes an InputStream as an argument. What we should 
>> probably do is allow the caller to specify the character set to read (there 
>> is another version of createXMLStreamReader() that takes both an InputStream 
>> and a java.nio.charset.Charset).
> 
> That is incorrect. XML specification says that the <?xml> processing
> instruction is in (IIRC) ASCII and it contains the encoding of the
> rest of the document., such as <?xml version="1.0" encoding="UTF-8"
> ?>, and compliant parsers should understand this. So, for instance, if
> the document is in UTF-16, the <?xml?> PI is NOT, and a regular text
> editor would have problem with handling that. For UTF-8, ISO-8859-X
> and others, the ASCII encoding coincide so not so obvious.
> 
> Cheers
> -- 
> Niclas Hedhman, Software Developer
> http://www.qi4j.org - New Energy for Java
> 
> I  live here; http://tinyurl.com/2qq9er
> I  work here; http://tinyurl.com/2ymelc
> I relax here; http://tinyurl.com/2cgsug

Re: Character '€'

Reply via email to