RE: Giga Character Set: Nothing but noise
Jon Babcock wrote: BTW, Marco, as near as I can recall, the above quotation in not from me. Did it again! Shame on me! Sorry! _ Marco
RE: Giga Character Set: Nothing but noise
Jon Babcock wrote: It seems to me that if not for that, how could anyone make a Chinese font? Who is going to sit down and draw a *myriad* or more characters? Since elements recur, this reduces the amount of labour required greatly. I too would have bet that all CJK foundries used some form of (automatic?) composition to build their fonts. But, after a few enquiries, it seem that they don't (or they do, but zealously guard the secret). _ Marco
RE: Giga Character Set: Nothing but noise
On Wed, 18 Oct 2000 [EMAIL PROTECTED] wrote: Jon Babcock wrote: It seems to me that if not for that, how could anyone make a Chinese font? Who is going to sit down and draw a *myriad* or more characters? Since elements recur, this reduces the amount of labour required greatly. I too would have bet that all CJK foundries used some form of (automatic?) composition to build their fonts. But, after a few enquiries, it seem that they don't (or they do, but zealously guard the secret). _ Marco Wednesday, October 18, 2000 If I had to make a guess it would be that transforming the glyphs of parts of characters so they will fit together in a pleasing fashion would take about as much effort (or more) than designing separate glyphs for each new character. Regards, Jim Agenbroad ( [EMAIL PROTECTED] ) The above are purely personal opinions, not necessarily the official views of any government or any agency of any. Phone: 202 707-9612; Fax: 202 707-0955; US mail: I.T.S. Dev.Gp.4, Library of Congress, 101 Independence Ave. SE, Washington, D.C. 20540-9334 U.S.A.
Re: Giga Character Set: Nothing but noise
It seems to me that if not for that, how could anyone make a Chinese font? Who is going to sit down and draw a *myriad* or more characters? Since elements recur, this reduces the amount of labour required greatly. .. .. [OT] Are there any character-encoding schemes that have CENTESIMAL DIGIT TEN, CENTESIMAL DIGIT ELEVEN, ... CENTESIMAL DIGIT NINETY-NINE? I had a clock with SEXAGESIMAL DIGITs ZERO through FIFTY-NINE on one wheel. Then I got sick of the noise it made sometimes and ripped the digits out. It seems to me that in vertical text, what would be better than san zen go hyaku yon juu hachi nin or 3 5 4 8 nin would be 35 48 nin but this is not allowed, is it? Jon Babcock [EMAIL PROTECTED] wrote: "Carl W. Brown" [EMAIL PROTECTED] wrote: If you were to start all over again with no interest in compatibility with existing code pages, you could drop the preformed characters. Since I've commented about the possibility of using a set of less than 2000 or so characters to represent all Chinese graphs more than once on this mailing list over the past few years, I'll be brief this time. Such a system was developed nearly fifty years ago by Peter A. Boodberg, at the Department of Oriental Languages at the University of California, Berkeley. His work was based directly on a study of Chinese sources, especially the Shuowenjiezi Dictionary. I was fortunate to be able to study under Professor Boodberg during his last couple years at Berkeley, shortly before his death in 1972. I've rewritten some of his ideas and placed them on my web site (kanji.com) under the name of CHA (Chinese Hemigram Annotation). And because it is difficult to find his original writings on this subject, I intend to host a few of Boodberg's key 'cedules' soon. When I first heard about Unicode (probably in late 1991), I naively assumed that it would employ some version of the Boodberg approach, i.e., the use of a 'small' subset of Chinese from which the entirety is composed. But, as has been stated many times on this list, the preferred approach was to base the Unicode Han repertoire on lists of precomposed hanzi/hanja/kanji that were actually in use in computers and, for the most part, were sanctioned by national governments. This was natural given the fact that the details (and here the details mean everything) of a system such as the one Dr. Boodberg envisioned were probably not available to the Unicode people, not were they in use by any national, commercial, or even academic body. In other words, it would have meant that such an approach would have had to have been developed by what came to be known as the Unicode Consortium itself. Although difficult, I believe that within the decade, the composition of the Chinese script will be recognized and well-understood, and the option to treat each of the tens of thousands of Chinese graphs, including new ones but excluding of course the 300 or so unsegmentable wen, as a digraph that can be decomposed into hemigrams will be made available, perhaps even in Unicode. In the meantime, vis-a-vis Unicode and the Han repertoire, it's a case of 'get over it'. I had to. Jon -- Jon Babcock [EMAIL PROTECTED] ___ Get your own FREE Bolt Onebox - FREE voicemail, email, and fax, all in one place - sign up at http://www.bolt.com
RE: Giga Character Set: Nothing but noise
Ar 18:30 -0800 2000-10-14, scríobh Doug Ewell: Yes, but 1500 times faster? I don't know if 11-Digit Boy was right about using Intercal, but their Unicode implementation must have been really slow. Speed is an issue, it seems. The two third-party Mac demos that use the Unicode keyboards under Mac OS 9 are very slow indeed. Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Vox +353 1 478 2597 ** Fax +353 1 478 2597 ** Mob +353 86 807 9169 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire
RE: Giga Character Set: Nothing but noise
Michael, Windows NT has had this problem. However all Unicode applications can run at close to the same speed. It took years to get there and the battle will not be one until Me is replaced so that there is an all Unicode platform and a new crop of applications is written as pure Unicode applications. It is largely a chicken and egg issue. Carl -Original Message- From: Michael Everson [mailto:[EMAIL PROTECTED]] Sent: Sunday, October 15, 2000 5:22 AM To: Unicode List Subject: RE: "Giga Character Set": Nothing but noise Ar 18:30 -0800 2000-10-14, scríobh Doug Ewell: Yes, but 1500 times faster? I don't know if 11-Digit Boy was right about using Intercal, but their Unicode implementation must have been really slow. Speed is an issue, it seems. The two third-party Mac demos that use the Unicode keyboards under Mac OS 9 are very slow indeed. Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Vox +353 1 478 2597 ** Fax +353 1 478 2597 ** Mob +353 86 807 9169 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire
RE: Giga Character Set: Nothing but noise
Michael Everson [EMAIL PROTECTED] wrote: Speed is an issue, it seems. The two third-party Mac demos that use the Unicode keyboards under Mac OS 9 are very slow indeed. and "Carl W. Brown" [EMAIL PROTECTED] responded: Windows NT has had this problem. However all Unicode applications can run at close to the same speed. It took years to get there and the battle will not be one until Me is replaced so that there is an all Unicode platform and a new crop of applications is written as pure Unicode applications. But none of this proves that Unicode is inherently slower or less efficient than any competing character encoding, only that an optimized solution is better than an unoptimized, hybrid one. -Doug Ewell Fullerton, California
RE: Giga Character Set: Nothing but noise
"Carl W. Brown" [EMAIL PROTECTED] wrote: The problem with languages like Korean is that they are carrying a lot of history. Today with the newer font technology there is no reason to have preformed characters. If you were to start all over again with no interest in compatibility with existing code pages, you could drop the preformed characters. Yes, I agree that it is more sensible (at least for some purposes) to use jamos for Hangul rather than allocating 11,000 code points for precomposed characters. Of course, we all know that compatibility with existing code pages was a deliberate design decision, without which Unicode would have been much less likely to succeed. This may be what they are talking about being more efficient. Yes, but 1500 times faster? I don't know if 11-Digit Boy was right about using Intercal, but their Unicode implementation must have been really slow. You can come close to selecting han based on radicals. They probably have a way to select among duplicate matches. Then you could cut the character set down the bopomofo or even the Latin pinyin. I don't know enough about Chinese input methods to comment on the rest of this, but if Unicode had implemented anything that merely "came close," you would never hear the end of how inadequate it was. -Doug Ewell Fullerton, California
Re: Giga Character Set: Nothing but noise
"Carl W. Brown" [EMAIL PROTECTED] wrote: If you were to start all over again with no interest in compatibility with existing code pages, you could drop the preformed characters. Since I've commented about the possibility of using a set of less than 2000 or so characters to represent all Chinese graphs more than once on this mailing list over the past few years, I'll be brief this time. Such a system was developed nearly fifty years ago by Peter A. Boodberg, at the Department of Oriental Languages at the University of California, Berkeley. His work was based directly on a study of Chinese sources, especially the Shuowenjiezi Dictionary. I was fortunate to be able to study under Professor Boodberg during his last couple years at Berkeley, shortly before his death in 1972. I've rewritten some of his ideas and placed them on my web site (kanji.com) under the name of CHA (Chinese Hemigram Annotation). And because it is difficult to find his original writings on this subject, I intend to host a few of Boodberg's key 'cedules' soon. When I first heard about Unicode (probably in late 1991), I naively assumed that it would employ some version of the Boodberg approach, i.e., the use of a 'small' subset of Chinese from which the entirety is composed. But, as has been stated many times on this list, the preferred approach was to base the Unicode Han repertoire on lists of precomposed hanzi/hanja/kanji that were actually in use in computers and, for the most part, were sanctioned by national governments. This was natural given the fact that the details (and here the details mean everything) of a system such as the one Dr. Boodberg envisioned were probably not available to the Unicode people, not were they in use by any national, commercial, or even academic body. In other words, it would have meant that such an approach would have had to have been developed by what came to be known as the Unicode Consortium itself. Although difficult, I believe that within the decade, the composition of the Chinese script will be recognized and well-understood, and the option to treat each of the tens of thousands of Chinese graphs, including new ones but excluding of course the 300 or so unsegmentable wen, as a digraph that can be decomposed into hemigrams will be made available, perhaps even in Unicode. In the meantime, vis-a-vis Unicode and the Han repertoire, it's a case of 'get over it'. I had to. Jon -- Jon Babcock [EMAIL PROTECTED]
Re: Giga Character Set: Nothing but noise
"Carl W. Brown" [EMAIL PROTECTED] wrote: If you were to start all over again with no interest in compatibility with existing code pages, you could drop the preformed characters. Since I've commented about the possibility of using a set of less than 2000 or so characters to represent all Chinese graphs more than once on this mailing list over the past few years, I'll be brief this time. Such a system was developed nearly fifty years ago by Peter A. Boodberg, at the Department of Oriental Languages at the University of California, Berkeley. His work was based directly on a study of Chinese sources, especially the Shuowenjiezi Dictionary. I was fortunate to be able to study under Professor Boodberg during his last couple years at Berkeley, shortly before his death in 1972. I've rewritten some of his ideas and placed them on my web site (kanji.com) under the name of CHA (Chinese Hemigram Annotation). And because it is difficult to find his original writings on this subject, I intend to host a few of Boodberg's key 'cedules' soon. When I first heard about Unicode (probably in late 1991), I naively assumed that it would employ some version of the Boodberg approach, i.e., the use of a 'small' subset of Chinese from which the entirety is composed. But, as has been stated many times on this list, the preferred approach was to base the Unicode Han repertoire on lists of precomposed hanzi/hanja/kanji that were actually in use in computers and, for the most part, were sanctioned by national governments. This was natural given the fact that the details (and here the details mean everything) of a system such as the one Dr. Boodberg envisioned were probably not available to the Unicode people, not were they in use by any national, commercial, or even academic body. In other words, it would have meant that such an approach would have had to have been developed by what came to be known as the Unicode Consortium itself. Although difficult, I believe that within the decade, the composition of the Chinese script will be recognized and well-understood, and the option to treat each of the tens of thousands of Chinese graphs, including new ones but excluding of course the 300 or so unsegmentable wen, as a digraph that can be decomposed into hemigrams will be made available, perhaps even in Unicode. In the meantime, vis-a-vis Unicode and the Han repertoire, it's a case of 'get over it'. I had to. Jon -- Jon Babcock [EMAIL PROTECTED]
Re: Giga Character Set: Nothing but noise
I see I was *doubly* "brief". Sorry for the duplicate message. Jon -- Jon Babcock [EMAIL PROTECTED]
Re: Giga Character Set: Nothing but noise
John Jenkins [EMAIL PROTECTED] wrote: Have we figured out yet what part of "Hamlet" the Giga people claim cannot be encoded in Unicode? I had to do some head scratching on that one. I finally figured out that it was meant rhetorically. Would the inability to encode Hamlet be acceptable? No. So why foist on the world a character set (viz., Unicode) that can't handle Chinese properly? Isn't Chinese as important as English? At least, I think that's what they meant. Yes, I finally figured that out after reading the white paper and doing a general Web search on Coventive and their "Giga Character Set." As Ken pointed out, they are based in Taiwan and have the usual focus on "efficient" CJK encoding and language-specific Han glyphs, along with a deep conviction that Western-based organizations couldn't possibly get this stuff right if they tried. In the white paper, they tip their hand by continually referring to "display codes" as if displaying glyphs were the only thing character codes were used for. (What about input, storage, comparison, collation, etc.?) There are several misstatements about Unicode, ranging from merely ignorant to -- David Starner had the right word for it -- outright slanderous. First, of course, is the premise that "16-bit" Unicode has room for only 65,536 characters. Most of the perceived shortcomings of Unicode are based on this falsehood and can be quickly dismissed. There is also a statement that contiguous ranges of Unicode code units are assigned to languages, when in fact Unicode maintains a studied ignorance of language and doesn't even require all characters in the same script to be encoded in the same block. Of course, there is the usual claim that "Unicode can not easily include the new characters that continue to be formed." Try telling someone who was in Boston or Athens recently that Unicode's rigid structure doesn't permit the addition of new characters! Then, another news flash: Unicode doesn't provide for the reality that "the directionality of written language can vary." So I guess that means the Bidirectional Algorithm, the Bidirectional Category field in UnicodeData.txt, the directional override codes, etc. don't actually exist. You gotta love the separate, proprietary, *patented* algorithms that are created to handle each specific language's "peculiarities." Note how English, French, Spanish, German, Italian, and Portuguese -- all at least 98% covered by Latin-1 -- each have their own GCS encodings. When do you suppose we will see the Basque, Sami, Azeri, Yi, Thaana, etc. algorithms? When Coventive unilaterally decides to support them? (Ah, but they have thrown in Klingon, just to prove it can be done.) And, of course, Coventive claims to have improved display performance dramatically -- 1500x for Korean! -- by composing glyphs dynamically from component pieces rather than referencing a precomposed glyph from a "behemoth look-up table." (Do they think some kind of search must take place to locate the glyph for code point U+mumble?) Conveniently ignored is the fact that not all CJK characters are decomposable in this way, the severe performance hit imposed on searching and sorting, and the fact that an approach like this would only work for CJK in any event. An article in the October 12, 2000 issue of Linux Weekly News http://lwn.net/bigpage.php3 tries to explain the benefit: "Many Asian characters are composites, made up of one or more simpler characters. Unicode simply makes a big catalog of characters, without recognizing their internal structure; GCS apparently handles things in a more natural manner." However, the article does not go on to specify just what is better, more efficient, or more "natural" about the GCS approach. (BTW, an article in the online Taipei Times mentioned that GCS assigns 4 bytes for each code point. So who's inefficient now?) I am sorely tempted to point out that their criticism of CJK glyph unification in Unicode could be addressed by judicious use of Plane 14 tags, but no matter; Giga is DOA. It is false economy, it attempts to solve perceived CJK problems by introducing bogus distinctions, it considers only one aspect of character code processing (display) while ignoring all others, and it is the patented, proprietary work of one company. We will never have to worry about Giga, and in a year or so we will forget it ever existed. -Doug Ewell Fullerton, California