Could the difference be that on AIX Xerces is built to use ICU for transcoding, 
and ICU does not throw an exception for an invalid UTF8 sequence?

If you give the AIX parser a valid UTF-8 sequence above ASCII range does it 
work correctly?  The UTF8 for pound sign is 0xC2 0xA3.

john

-----Original Message-----
From: Giulio Troccoli [mailto:[email protected]] 
Sent: Thursday, August 13, 2009 8:50 AM
To: '[email protected]'
Subject: RE: Invalid byte 1 (£) of a 1-byte sequence

Thanks John, but my XML file is not in UTF-8, as I was trying to demonstrate 
with the hex dump extract (but if I'm wrong please let me know).

And I tell Xerces that it is so it throws the error which is correct. All I 
wanted to know is why the same thing does not happen on AIX. Instead, on AIX, 
Xerces parses the file and even gets the pound sign. It seems it silently 
switch to ISO-8859-1 when it gets to the pound sign seeing that it's not in 
UTF-8.

> -----Original Message-----
> From: John Lilley [mailto:[email protected]] 
> Sent: 13 August 2009 15:46
> To: [email protected]
> Subject: RE: Invalid byte 1 (£) of a 1-byte sequence
> 
> We've had no trouble reading and writing UTF-8.  I set up an 
> example using the british pound symbol like you have, and 
> Xerces 2.8 correctly encodes and decodes it.  We do not even 
> supply XML header as it defaults to UTF-8.  Do you have a 
> code snippet that produces this output?
> 
> john
> 
> -----Original Message-----
> From: Giulio Troccoli [mailto:[email protected]]
> Sent: Thursday, August 13, 2009 4:19 AM
> To: [email protected]
> Subject: Invalid byte 1 (£) of a 1-byte sequence
> 
> Hello everybody.
> 
> We have been using Xerces and Xalan for few years and 
> recently I have upgraded to Xerces 2.8 and Xalan 1.10. I have 
> personally built both on Windows and AIX.
> 
> One of our applications produces an XML file that another 
> application processes. This second application throws the 
> error in this email subject.
> 
> First of all I would like to make sure I have my facts right.
> 
> The XML specifies an encoding of UTF-8 with <?xml 
> version="1.0" encoding="UTF-8"?> in the first line. However, 
> I don't think it's been saved in UTF-8 because if I open it 
> as binary and go to where the £ is, I can see the following
> 
> BEFCE0: 20 A3 30 2E 35 E8 2E 3C  2F 6C 69 6E 65 3E 0A 20   
> £0.58.</line>.
> 
> I was expecting £ to be encoding with 2 bytes. Am I correct 
> in assuming this?
> 
> If I am correct, then the error is correct too.
> 
> My question is about AIX. I don't have the error in AIX and 
> the XML document is parsed correctly. Also I didn't have any 
> problem with the pound sign with the old versions of Xerces 
> and Xalan, 2.1 and 1.4 respectively, but that doesn't matter now.
> 
> Would any one be in a position to confirm that this is a bug 
> in Xerces 2.8 on AIX?
> 
> If, of course, I change the encoding in the XML to ISO-8859-1 
> it works on Windows too, and that's probably what we will do, 
> as it's the right thing to do. Still, I'd like to know 
> whether there is a bug on AIX (so that I can say "it's a bug" 
> when they ask me "why does it work on AIX then?")
> 
> Thanks
> Giulio
> 
> 
> Linedata Services (UK) Ltd
> Registered Office: Bishopsgate Court, 4-12 Norton Folgate, 
> London, E1 6DB
> Registered in England and Wales No 3027851    VAT Reg No 778499447
> 
> 
> 
> 

Reply via email to