On Fri, Feb 01, 2002 at 08:22:54AM -0800, Bernard Miller wrote: > Hello, > For those of you not already on the Unicode mailing list I thought you would > like to be aware of www.bytext.org. Bytext has a much better design than > Unicode and is a better long term solution. One of the main features is that > it is designed to be searchable with fast 8 bit regular expression > algorithms. You may want to build in some flexibility to deal with Bytext in > your implementation of UTF-8, perhaps even give up on UTF-8 altogether if it > <92>s possible for you to focus on the long term.
And for those of you not already on the Unicode mail list, let me give you a brief summary from my point of view: Bytext is a very complex character encoding standard, offering little to nothing over Unicode, and losing key features of Unicode like combining characters and ASCII compatibility. (A minor pet peeve is that the author left out a FORM FEED, as "Since FF looks like a PEC in screen display it can produce unanticipated results...". Really? Vim shows form feeds quite nicely, and when I put a form feed in, I usually expect the results.) I can't really explain more because of the next point. The author of Bytext shows neither political savy nor typographic skill. He does not offer his standard in a plain text format, instead choosing to offer it in Microsoft Word and PDF format. However, he doesn't take advantage of those formats, giving us a badly formatted document with headers and main points poorly marked or left unmarked. Furthermore, the writing style is very hard to read, littering the document with newly created acronyms and spending time attacking Unicode that should have been used explaining the standard. He also shows algorithms through Java code instead of writing it out. After quickly reading through the document, I had no idea what a properly formed Bytext string would look like and I didn't see any examples showing me one. It needs an editor more than any work I've ever seen. (Another pet peeve would be insisting that every programming language add a type uByte for Bytext. Many languages have an existing byte type, and many languages and programmers would find uByte a hideous name clashing with the rest of the language. This is yet another place where the author wants the world to change to fit Bytext instead of Bytext working with the world.) I find ISO-2022, Tron and Rosetta to be interesting, but I can't even say that about Bytext. Maybe after some serious editing, some interesting ideas might surface, but the complexity would still make it unusable. -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, "Peace and Love, Inc." -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/