Martin Duerst <[EMAIL PROTECTED]> wrote: > > I think each Hangul character carries the information of only about > > 1.5 English letters, > > It may be lower than Chinese, but I'm very surprised it should be that > low.
You're right. My estimate must have been based on an anomalous sample. Here are the counts for Genesis chapter 1: King James: 3167 letters Basic English: 3088 letters Chinese Union: 778 ideographs Korean Revised: 1201 Hangul references: http://www.ccim.org/bible/ http://bible.wisenet.co.kr/ So it's about 4.0 English letters per Chinese ideograph, and about 2.6 English letters per Korean Hangul. Each Korean Hangul takes about 2.9 octets in AMC-ACE-Z, which means a maximal Korean domain label (20 hangul) holds about as much information as a 52-letter English string, which about 17% less information than a maximal English domain label (63 letters), and about 38% less information than a maximal Chinese domain label (19 ideographs). I now retract this statement: > Of all the languages I've looked at, Korean is by far the least dense > when encoded using AMC-ACE-Z. In light of the new data, I doubt that Korean is the least dense. AMC
