I used a hexdump utility to examine the contents of your file.  It's
UTF-16 with a byte-order mark (BOM).  The BOM enables a parser to
determine the encoding and byte order without reference to the declared
encoding.  (With a 16-bit encoding, you have to know whether the
high-order byte comes first or second.  The BOM gives the parser a way
to figure this out.)

You specified ISO-8859-1 as the encoding in the XML declaration, which
does not match the actual document encoding.  (Textpad won't and
shouldn't change the encoding declaration for you based on the encoding
you specify at save time.)  The parser found that it could not make
sense of the first character in the document body based on the specified
encoding, so it gave up.  Changing the encoding declaration to UTF-16 or
omitting it altogether allows the document to parse.

You probably got away with using textpad to save as UTF-8 because UTF-8
and ISO-8859-1 represent the characters that you actually use in your
sample document the same way.  This is not true for all characters,
though, so if you save your document as UTF-8, the declared encoding
should be UTF-8.

If you haven't installed an error handler, you might want to.

> -----Original Message-----
> From: Stephane Negri [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, March 29, 2005 8:54 AM
> To: xerces-c-dev@xml.apache.org
> Subject: Re: Xerces Unicode
> 
> 
>  >What do you mean when you say the file is written in 
> Unicode?  UTF-8 is
>  >> one of the three standard Unicode encodings (the other 
> two being UTF-16
>  >> and UTF-32).  Does the encoding specified in the document 
> match the
>  >> actual encoding?  It might be helpful for you to send a 
> sample document
> 
> 
> sorry, in fact I m just trusting Textpad on this. It has 
> different file
> types : ANSI, Unicode, Unicode(big Endian), UTF-8, ...
> 
> My file is (according to it) in Unicode, like this my SAXParser cannot
> read it. If I save it in UTF-8, the parsing succeed.
> 
> You can find in attachment my file sample.
> 
> Thanks,
> 
> Stephane
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to