> The thing is, in this day and age, the *right* encoding is UTF-8. Nonsense. UTF-8 has three problems I can think of immediately, and probably more that would take me longer to think of, each of which renders it broken for various use cases:
(1) characters are not all the same size (from one through four? five? octets); (2) it confuses character set (the set of abstract glyphs), encoding (the mapping between glyphs and codepoints), and serialization (the mapping between streams of codepoints and streams of storage or transmission units (bytes/bits)); (3) because it's always representing Unicode codepoints (see point 2), it inherits all Unicode's problems, such as the ridiculously huge tables needed for normalization. There is no one-size-fits-all character set, nor encoding, nor even serialization, no matter what the priests of the UTF-8 religion would have you believe. If you want to argue that UTF-8 is the best default, that at least is worth discussing. But maintaining that there is any single "the right" character set, encoding, or serialization is...nonsense. There is, at most, "right" for a particular use case, or set of use cases. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML mo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B