Are you aware that Java was created (or frozen) when Unicode required 16 bits? (It wasn't a mistake at the time.)
Yes, I remember the age when unicode ala ucs-2 was the future, and there was a big push to move to it. It did strike me as somewhat misguided though, and it seemed that there would forever be a huge incompatibility issues in software. Using 16bit words in streams seemed perilous, and highly internet unfriendly. Plus, the vast majority of software was ascii-only, and would never join the unicode world. When I first saw the utf-8 encoding description, it was an epiphany of sorts.
> Normally, you should not have to ever convert strings between > encodings. Then how do you process, say, a multi-part MIME body that has parts in different character encodings?
Excellent example. Email is absolutely something that you can work with on a byte-by-byte basis and have no need for considering characters. You can drop big blocks of bytes out to conversion routines, and you dont ever have to know what the unicode codepoints are. Not every tool has to worry about encodings, and if they do we're only going to end up with tons of non i18n programs being written. You should reasonably be able to write an email program in perl that drops out to iconv and openssl etc as needed to convert things to utf-8 and otherwise doesnt care about encoding at all, and makes no special considerations for it.
> Its just > not your problem, plus it indroces a ton of potential headaches. > Just assume your input is in the encoding its supposed to be in. You never deal with multiple inputs?
All the time :) -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
