Thank you for your mail. Avaleo office is closed from December 23 to January 2 were we will address you request. If you have urgent requests it's possible to call the following persons: - From December 23th to December 27th (both days included) please call Niels Dyrnesli on +45 40741496. - From December 28th to January 1 (both days included) please call Anders Kofoed +4529442941 Kind regards Anders Kofoed Avaleo ApS
-----Original Message----- Reply-To: "Jakarta Commons Users List" <[email protected]> Subject: Re: [Configuration] UTF-8 encoding problem From: Simon Kitching <[EMAIL PROTECTED]> To: Jakarta Commons Users List <[email protected]> Date: Fri, 29 Dec 2006 01:00:51 +1300 > On Thu, 2006-12-28 at 11:15 +0000, Andrew Shirley wrote: > > On Thu, Dec 28, 2006 at 11:30:07AM +0100, DECAFFMEYER MATHIEU wrote: > > > > > > Hi, > > > > > > I am using Jakarta Configuration to manipulate some XML files. > > > > > > > > > > > > > What do u suggest me to do ? > > > > > > Thank u for any help ! Will be greatly appreciated ! > > > > This may be that the file isn't actually UTF-8 i.e. it contains some > > extended ASCII characters. The usual problem in the uk is the pound > > sign but the euro is probably a good candidate as well. I would check > > that you are only using the standard (i.e. < 128) ascii characters. > The UTF-8 encoding can handle any character at all, not just ASCII. > The error message you are seeing is not being generated by > commons-configuration, but by the underlying xml parser: > > > > Caused by: java.io.UTFDataFormatException: Octet 2 incorrect dans la > > séquence UTF-8 à 3-octets. > > at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown > In other words, your input file is corrupt; the xml parser has > encountered a sequence of bytes that does not correspond to any valid > character. > You will need to fix your input file so that it is valid UTF-8. There is > no way that the commons-configuration library can process your data if > the xml parser refuses to parse it. > One possibility is that the input file is actually encoded in an 8-bit > character encoding such as LATIN-1, NOT UTF-8 at all. > With UTF-8, any byte from 0 through 127 is an ASCII character, while a > byte from 128 through 255 indicates the start of a multibyte sequence > (two or more bytes) that represents a character that is NOT in the ascii > set. > With an 8-bit encoding like LATIN-1, values from 128 to 255 are NOT > multibyte sequences, but instead represent a specific set of 128 > "extended characters", and there is no way to represent a character that > is not in the set associated with that encoding. > Regards, > Simon > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
