Hi,
I am starting to use Castor and I don't understand one thing.
All the examples I have seen are using this construct to unmarshall a file:
Reader reader = new FileReader("test.xml");
Person person = (Person)Unmarshaller.unmarshal(Person.class, reader);This is plainly *WRONG*. The FileReader class converts bytes to characters using "operation system" encoding, which is *different* than the encoding of the XML file, i.e. the one specified in
<?xml version="1.0" encoding="UTF-8" ?>
This may go unnoticed by native English speakers who never use any characters outside of US-ASCII, but is important for languages using other characters.
Especialy because the default encoding for XML files is UTF-8, and that one is used by Castor Marshaller by default, reading such XML files on any system that is not using UTF-8 locale will break non-US-ASCII characters.
I think the correct way is:
import org.xml.sax.InputSource;
..
Unmarshaller un = new Unmarshaller(Person.class);
InputSource src = new InputSource(new FileInputStream("test.xml"));
Person person = (Person) un.unmarshal(src);or better the Unmarshaller needs a new method
Object unmarshal(java.lang.Class c, java.io.InputStream stream)
It is not a proper way to try to pre-parse the XML declaration of the file to extract the encoding before the Reader is created, as this is a job for XML Parser, it does it already.
Can please somebody explain to me how is it possible that such a fundamental bug is in all examples ? Or am I missing something ?
Martin -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Supercomputing Center Brno Martin Kuba Institute of Computer Science email: [EMAIL PROTECTED] Masaryk University http://www.ics.muni.cz/~makub/ Botanicka 68a, 60200 Brno, CZ mobil: +420-603-533775 --------------------------------------------------------------
----------------------------------------------------------- If you wish to unsubscribe from this mailing, send mail to
[EMAIL PROTECTED] with a subject of:
unsubscribe castor-user
