RE: Invalid byte 1 (£) of a 1-byte sequence

Giulio Troccoli Fri, 14 Aug 2009 04:47:19 -0700


Linedata Services (UK) Ltd
Registered Office: Bishopsgate Court, 4-12 Norton Folgate, London, E1 6DB
Registered in England and Wales No 3027851    VAT Reg No 778499447

-----Original Message-----


> From: John Lilley [mailto:[email protected]]
> Sent: 14 August 2009 12:42
> To: [email protected]
> Subject: RE: Invalid byte 1 (£) of a 1-byte sequence
>
> I will also quite likely say some thing stupid, but here goes :)
>
> I suggest that there are two possibilities:
>
> 1) Xerces on AIX is ignoring your request to use UTF-8, and
> is instead using the default 8859-1
> 2) Xerces, or the underlying transcoder it uses, is
> translating  UTF-8, but is too lenient when it encounters the
> invalid escape sequence, and makes some ad-hoc (or buggy)
> attempt to convert the code anyway.
>
> I would suggest this experiment: feed the parser a document
> containing the valid sequence (C2 A3) and see if it is parsed
> correctly.  If so, then the answer is most likely (2) else
> (1).  Armed with that information you can seek the
> appropriate corrective action.

It won't be easy but I'll give it a go and report back.

In any case, that would be a bug in Xerces, wouldn't it? So the "approrpiate 
corrective action" would be to change my application to work around this bug, 
wouldn't it?

Giulio

RE: Invalid byte 1 (£) of a 1-byte sequence

Reply via email to