On Thu, 21 Feb 2002, Paul Smiley <[EMAIL PROTECTED]> wrote: > "...really use UTF-8" - am I not using UTF-8 when using > 'encoding="UTF-8"'?
No, you only claim you'd be using UTF-8. � is the ISO-8859-1 encoded version of the Unicode character with the number 230. The UTF-8 encoded version consists of the two bytes æ. > Is there some type of byte mark as there is with UTF-16? UTF-8 uses between one and three bytes to encode characters - only the first 127 characters use a one byte encoding. I'm sure you'll find more then enough resources that will give you the full details on the web. You could write your XML file using Java and set the encoding of your OutputStreamWriter to UTF8 to see what it will look like. > Also, I need to support Kanji and Chinese characters, so I believe > that UTF-8 and ISO-8859-1 are inadequate. UTF-8 is probably fine, ISO-8859-1 is completely inadequate. UTF-8 is one encoding for the complete sixteen bit Unicode set, as is UTF-16. ISO-8859-1 is a completely different character set that happens to be identical with the first 256 characters of Unicode, and it is the character set used by default on most operating systems in the US and western Europe. Stefan -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
