I have refrained from saying anything about this topic because I judged that anything I said would be predictable. I am a well-known offender, a flagrant Unicode, i.e., minimally UTF-16, advocate.
Now, however, Charles Mills has pushed me into posting something. He writes <begin extract> That is called UTF-16. Pretty good but still not very efficient. </end extract> As usual, it depends. If one's problems are always with a single pair of natural languages, one of which is English (ENG or ENU), which makes little use of orthographically marked letters, a satisfactory UTF-8 'solution' may be, indeed usually is, possible. Something can, that is, be done in a UTF-8 framework with such languiage pairs as o English and French. o English and German, or even o English and Polish. As soon, however, as you need to support o three or more different roman-alphabet natural languages, or o a roman-alphabet language and a non-alphabetic Asian language you need UTF-16. To put the matter more brutally, any new system being built today and in particular any new system that is likely to interact, at whatever remove, with web-based systems should use UTF-16. The notion that the only efficient representation for character data is an SBCS one is retrograde at best. Continuing with it will make trouble for those who do so; worse, it will ensure that the systems they build are short-lived. The ASCII vs EBCDIC dispute is no longer of much interest. They are both obsolescent, usable safely only in what the international lawyers call municipal contexts. John Gilmore, Ashland, MA 01721 - USA ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN