On Wed, 14 Dec 2005, Bill Hacker wrote: > Simply set 'UTF-8' in the meta-data of the webpage. > > ISO-8859-1 is a (mostly) proper subset, but not the reverse.
That isn't strictly true. Confusion arises because UTF-8 is not, strictly, a character encoding. It is a way of encoding (compressing, really) a sequence of numbers whose values need up to 24 bits to represent in binary into a string of 8-bit bytes, where the first 128 numbers are represented by single bytes. Unicode is a character encoding that defines character code points, also values up to 24 bits, though the majority are within the 16 bit limit. Unicode is often represented using the UTF-8 value encoding, but not always. Some applications use straight 16-bit values. However, in the context of many applications, including, it seems, the web, the name "UTF-8" has become synonymous with "Unicode, encoded as UTF-8". ISO-8859-1 code values are a subset of Unicode code values. However, ISO-8859-1 code values are always represented as single bytes. This means that values 0-127 are indeed identical to the UTF-8 values 0-127. However, the remaining ISO-8859-1 code points (128-255), though they encode the same characters as Unicode, are not represented in the same way. In ISO-8859-1 these values are single bytes; in Unicode/UTF-8 they require two bytes. Take, for example, the character whose Unicode and ISO-8859-1 code point is 00F7 (the divide symbol). In ISO-8859-1 this would be the single byte with hex value F7; in UTF-8 this value is coded as two bytes C3, B7. Therefore, if you have a file that contains ISO-8859-1 and it contains characters in the range 128-255, you cannot just pretend that it is UTF-8 Unicode. In fact, it will most probably be invalid as a UTF-8 file because the bytes with the top bit set won't, in general, form valid UTF-8 sequences. Some of them, though (e.g. the sequence C3, B7) will be valid as UTF-8. So you will get a mess. -- Philip Hazel University of Cambridge Computing Service, [EMAIL PROTECTED] Cambridge, England. Phone: +44 1223 334714. Get the Exim 4 book: http://www.uit.co.uk/exim-book -- ## List details at http://www.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://www.exim.org/eximwiki/
