Re: Announcing Bytext

David Starner Sun, 03 Feb 2002 11:10:13 -0800

On Sun, Feb 03, 2002 at 05:57:28AM -0800, Bernard Miller wrote:
> Bytext can be thought of as an
> excercise in massive precomposition, an attempt to eliminate the need
> for combining characters and formatting characters and grapheme
> clusters. Precomposition is the spirit of the W3C character model,
> Bytext simply takes this to it’s logical conclusion.


First, "its" is a possesive pronoun. "it's" is a contraction for "it
is".

> It simplifies
> many text processes, especially for syllable oriented scripts like
> Devanagari. It may seem to involve too many characters, but it is
> finite and thus considerably less than the infinite number of abstract
> characters in Unicode. 

It's no easier to deal with a very large number of characters than to
deal with an infinite number of characters.

> About people having an emotional attachment to Unicode, I’m not
> necessarily referring to people on this thread. Perhaps David has
> emotional issues with bad typography, maybe he was abused as a child
> by poor documentation ;-)  

It's unprofessional. The only English book I have that is as hard to
read as your standard is "Winning Chess Openings", and his terminology
is standard for the field.

> or the knee-jerk ridicule of new characters I
> proposed which later received serious consideration by other members;

You propose new similies, and expect to be taken seriously? Propose
something of serious use - Old Hungarian, say - and people will respond
better.

> or the many people who took offense at the mere implication that they
> should find it interesting? 

It's the Waco Kid syndrome. When every idiot with a pair of six shooters
is challenging you to a fight, it gets a little annoying; when it's done
by someone who's clearly out of his league (no serious support, for
example), most people don't want to waste their time even looking at it.

> Character encoding as a science is kind of
> like arithmetic, one doesn’t expect a lot of major new developments
> --but things like lambda calculus still come along many years later.

And while Lisp 1.0 used lambda calculus to do arithmetic, Lisp 1.5 added
arithmetic primitives, since using lists for arithmetic was so
incredibly slow.

> If someone implementing an arithmetic library doesn’t eeven find
> lambda calculus interesting and refuses to even read about it, 

Why would they care? Lambda calculus has nothing to do with what they're
doing.

> As for ASCII transparency (a more appropriate word than compatibility)
> and the general notion of how complex Bytext is compared to Unicode,
> there are 2 important concepts to take note of: The first is that
> making things easier for the user will USUALLY involve making things
> more difficult for the developer. You can’t expect a user to shed a
> tear for a developer, the user simply wants the best thing possible.

If you make stuff complex for the developer, there will be more bugs in
implementations, and it will be harder to move data from one
implementation to another,  due to differing interpretations. In extreme
cases, forcing developers to use more complex tools means that some
programs will never get written, because it's just not worth the time.
There are a thousand different implementations of ISO-2022, and they are
all different. The only consistency is in small subsets like
ISO-2022-JP, which are nowhere near being a universal charset.

> I propose that fast and intuitive regular expressions are
> a feature that will not lose importance because no matter how fast
> computers get, the amount of data that needs to be searched can easily
> grow even faster. 

Computer speed increases at the same rate as, or faster than, storage
space. (My first computer, a 386, had a 60 MB harddrive and 1 bogomip
out of the box. My current computer, a PIII, had a 20 GB harddrive and
450 bogomips out of the box.) Any case, text searches are not the end
all and be all of text. I'd say that word processing and basic
communication (HTML, email, IM) are far more important.

> In absolute
> terms of complexity, Bytext is much simmpler than Unicode. 

Really. 

> East Asian Width properties go from being described in an entire
> technical report with 6 properties to being equivalently described by
> a single paragraph and a single property. 

Then it's buggy. The reason why East Asian Width has 6 properties is
because there are about 6 states that a character can be in with respect
to East Asian Width. 

> Consider the many
> Unicode technical reports, the 850 page book, the many files of the
> Unicode database.. 

It's a thousand page book, but 600 of those pages are the glyphs and
names you didn't bother to provide, and many of the rest of them
providing clear explanations about scripts and their histories. All
stuff any serious standard will have to supply. 

Many technical reports are things you didn't supply with Bytext. Script
Names, standard EBCDIC compatible encodings, a locale-sensitive
collation scheme.

> Truly, it is hard to imagine
> how Unicode could be made any more complex. 

And yet, quickly after picking up Unicode, most of us could encode the
below string in UTF-16:

<FE><FF><13><B0><13><B5><00><20>...

> How about an example? Say, "ᎰᎵ hat Musik gut gehört." What does that
> look like bytewise in Bytext?

After reading the Bytext standard three times, I still don't how to
encode that in Bytext. 

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, "Peace and Love, Inc."
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Announcing Bytext

Reply via email to