His points, and the points in the serious article he links to, have merit. It is easy to write UCS-2 or UTF-16 or whatever it is called code on the erroneous theory that every character is 16 bits. Hard to make the equivalent assumption with UTF-8. All you are doing with UCS-2 is making it harder to test the outlier conditions.
The article also points out that "what is a character?" is not a simple question, so it is impossible to say that every character is so many bits, even in UTF-32. UTF-32 considers "ch" to be two characters, but to Czech speakers it apparently is only one. Ditto for combining characters. é (hope that makes it through the listserver) may I believe be legitimately encoded as two "computer" characters, but everyone considers it culturally to be a single character. Charles -----Original Message----- From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf Of John McKown Sent: Friday, October 04, 2013 5:25 AM To: [email protected] Subject: OT? A cause to join, but somewhat humorous http://www.theregister.co.uk/2013/10/04/verity_stob_unicode/ "Down with Unicode!" <grin/> ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
