Re: Subject Unicode

John Gilmore Fri, 10 Jan 2014 06:51:47 -0800

I have refrained from saying anything about this topic because I
judged that anything I said would be predictable.  I am a well-known
offender, a flagrant Unicode, i.e., minimally UTF-16, advocate.


Now, however, Charles Mills has pushed me into posting something.   He writes

<begin extract>
That is called UTF-16. Pretty good but still not very efficient.
</end extract>

As usual, it depends.  If one's problems are always with a single pair
of natural languages, one of which is English (ENG or ENU), which
makes little use of orthographically marked letters, a satisfactory
UTF-8 'solution' may be, indeed usually is, possible.

Something can, that is, be done in a UTF-8 framework with such
languiage pairs as

o English and French.

o English and German, or even

o English and Polish.

As soon, however, as you need to support

o three or more different  roman-alphabet natural languages, or

o a roman-alphabet language and a non-alphabetic Asian language

you need UTF-16.

To put the matter more brutally, any new system being built today and
in particular any new system that is likely to interact, at whatever
remove, with web-based systems should use UTF-16.

The notion that the only efficient representation for character data
is an SBCS one is retrograde at best.  Continuing with it will make
trouble for those who do so; worse, it will ensure that the systems
they build are short-lived.  The ASCII vs EBCDIC dispute is no longer
of much interest.  They are both obsolescent, usable safely only in
what the international lawyers call municipal contexts.

John Gilmore, Ashland, MA 01721 - USA

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Subject Unicode

Reply via email to