"Bernard Miller" wrote on 2002-02-01 16:22 UTC: > Hello, > For those of you not already on the Unicode mailing list I thought you would > like to be aware of www.bytext.org. Bytext has a much better design than > Unicode and is a better long term solution. One of the main features is that > it is designed to be searchable with fast 8 bit regular expression > algorithms. You may want to build in some flexibility to deal with Bytext in > your implementation of UTF-8, perhaps even give up on UTF-8 altogether if it's > possible for you to focus on the long term.
UCS has quite a number of historic oddities, no doubt, but at least we understand them rather well now and they are reasonably easy to work around. The way we have started to use UTF-8 on POSIX, GNU, Perl, etc. systems fixes already many of the same problems that Bytext tries to fix. Therefore, I don't see Bytext offering any so enormously significant practical advantage to consider it as a really serious alternative to UTF-8. I suspect that for our lifetime there was only one realistic moment in history to get the entire industry to agree onto a single coded character set architecture, and I fear Bytext comes pretty exactly 10 years too late here. http://www.cl.cam.ac.uk/~mgk25/unicode.html Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/> -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/