Are you aware that Java was created (or frozen) when Unicode required
16 bits?  (It wasn't a mistake at the time.)

Yes, I remember the age when unicode ala ucs-2 was the future, and
there was a big push to move to it. It did strike me as somewhat
misguided though, and it seemed that there would forever be a huge
incompatibility issues in software. Using 16bit words in streams
seemed perilous, and highly internet unfriendly. Plus, the vast
majority of software was ascii-only, and would never join the unicode
world.

When I first saw the utf-8 encoding description, it was an epiphany of sorts.

> Normally, you should not have to ever convert strings between
> encodings.

Then how do you process, say, a multi-part MIME body that has parts
in different character encodings?

Excellent example. Email is absolutely something that you can work
with on a byte-by-byte basis and have no need for considering
characters. You can drop big blocks of bytes out to  conversion
routines, and you dont ever have to know what the unicode codepoints
are.

Not every tool has to worry about encodings, and if they do we're only
going to end up with tons of non i18n programs being written.  You
should reasonably be able to write an email program in perl that drops
out to iconv and openssl etc as needed to convert things to utf-8 and
otherwise doesnt care about encoding at all, and makes no special
considerations for it.

> Its just
> not your problem, plus it indroces a ton of potential headaches.
> Just assume your input is in the encoding its supposed to be in.

You never deal with multiple inputs?

All the time :)

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to