On Sun, Feb 03, 2002 at 05:57:28AM -0800, Bernard Miller wrote: > Bytext can be thought of as an > excercise in massive precomposition, an attempt to eliminate the need > for combining characters and formatting characters and grapheme > clusters. Precomposition is the spirit of the W3C character model, > Bytext simply takes this to it’s logical conclusion.
First, "its" is a possesive pronoun. "it's" is a contraction for "it is". > It simplifies > many text processes, especially for syllable oriented scripts like > Devanagari. It may seem to involve too many characters, but it is > finite and thus considerably less than the infinite number of abstract > characters in Unicode. It's no easier to deal with a very large number of characters than to deal with an infinite number of characters. > About people having an emotional attachment to Unicode, I’m not > necessarily referring to people on this thread. Perhaps David has > emotional issues with bad typography, maybe he was abused as a child > by poor documentation ;-) It's unprofessional. The only English book I have that is as hard to read as your standard is "Winning Chess Openings", and his terminology is standard for the field. > or the knee-jerk ridicule of new characters I > proposed which later received serious consideration by other members; You propose new similies, and expect to be taken seriously? Propose something of serious use - Old Hungarian, say - and people will respond better. > or the many people who took offense at the mere implication that they > should find it interesting? It's the Waco Kid syndrome. When every idiot with a pair of six shooters is challenging you to a fight, it gets a little annoying; when it's done by someone who's clearly out of his league (no serious support, for example), most people don't want to waste their time even looking at it. > Character encoding as a science is kind of > like arithmetic, one doesn’t expect a lot of major new developments > --but things like lambda calculus still come along many years later. And while Lisp 1.0 used lambda calculus to do arithmetic, Lisp 1.5 added arithmetic primitives, since using lists for arithmetic was so incredibly slow. > If someone implementing an arithmetic library doesn’t eeven find > lambda calculus interesting and refuses to even read about it, Why would they care? Lambda calculus has nothing to do with what they're doing. > As for ASCII transparency (a more appropriate word than compatibility) > and the general notion of how complex Bytext is compared to Unicode, > there are 2 important concepts to take note of: The first is that > making things easier for the user will USUALLY involve making things > more difficult for the developer. You can’t expect a user to shed a > tear for a developer, the user simply wants the best thing possible. If you make stuff complex for the developer, there will be more bugs in implementations, and it will be harder to move data from one implementation to another, due to differing interpretations. In extreme cases, forcing developers to use more complex tools means that some programs will never get written, because it's just not worth the time. There are a thousand different implementations of ISO-2022, and they are all different. The only consistency is in small subsets like ISO-2022-JP, which are nowhere near being a universal charset. > I propose that fast and intuitive regular expressions are > a feature that will not lose importance because no matter how fast > computers get, the amount of data that needs to be searched can easily > grow even faster. Computer speed increases at the same rate as, or faster than, storage space. (My first computer, a 386, had a 60 MB harddrive and 1 bogomip out of the box. My current computer, a PIII, had a 20 GB harddrive and 450 bogomips out of the box.) Any case, text searches are not the end all and be all of text. I'd say that word processing and basic communication (HTML, email, IM) are far more important. > In absolute > terms of complexity, Bytext is much simmpler than Unicode. Really. > East Asian Width properties go from being described in an entire > technical report with 6 properties to being equivalently described by > a single paragraph and a single property. Then it's buggy. The reason why East Asian Width has 6 properties is because there are about 6 states that a character can be in with respect to East Asian Width. > Consider the many > Unicode technical reports, the 850 page book, the many files of the > Unicode database.. It's a thousand page book, but 600 of those pages are the glyphs and names you didn't bother to provide, and many of the rest of them providing clear explanations about scripts and their histories. All stuff any serious standard will have to supply. Many technical reports are things you didn't supply with Bytext. Script Names, standard EBCDIC compatible encodings, a locale-sensitive collation scheme. > Truly, it is hard to imagine > how Unicode could be made any more complex. And yet, quickly after picking up Unicode, most of us could encode the below string in UTF-16: <FE><FF><13><B0><13><B5><00><20>... > How about an example? Say, "ᎰᎵ hat Musik gut gehört." What does that > look like bytewise in Bytext? After reading the Bytext standard three times, I still don't how to encode that in Bytext. -- David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber) Pointless website: http://dvdeug.dhis.org What we've got is a blue-light special on truth. It's the hottest thing with the youth. -- Information Society, "Peace and Love, Inc." -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
