On Thu, 2006-12-28 at 11:15 +0000, Andrew Shirley wrote: > On Thu, Dec 28, 2006 at 11:30:07AM +0100, DECAFFMEYER MATHIEU wrote: > > > > Hi, > > > > I am using Jakarta Configuration to manipulate some XML files. > > > > > > > > What do u suggest me to do ? > > > > Thank u for any help ! Will be greatly appreciated ! > > This may be that the file isn't actually UTF-8 i.e. it contains some > extended ASCII characters. The usual problem in the uk is the pound > sign but the euro is probably a good candidate as well. I would check > that you are only using the standard (i.e. < 128) ascii characters.
The UTF-8 encoding can handle any character at all, not just ASCII. The error message you are seeing is not being generated by commons-configuration, but by the underlying xml parser: > > Caused by: java.io.UTFDataFormatException: Octet 2 incorrect dans la > séquence UTF-8 à 3-octets. > at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown In other words, your input file is corrupt; the xml parser has encountered a sequence of bytes that does not correspond to any valid character. You will need to fix your input file so that it is valid UTF-8. There is no way that the commons-configuration library can process your data if the xml parser refuses to parse it. One possibility is that the input file is actually encoded in an 8-bit character encoding such as LATIN-1, NOT UTF-8 at all. With UTF-8, any byte from 0 through 127 is an ASCII character, while a byte from 128 through 255 indicates the start of a multibyte sequence (two or more bytes) that represents a character that is NOT in the ascii set. With an 8-bit encoding like LATIN-1, values from 128 to 255 are NOT multibyte sequences, but instead represent a specific set of 128 "extended characters", and there is no way to represent a character that is not in the set associated with that encoding. Regards, Simon --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
