----- Original Message ----- From: "David Hopwood" <[EMAIL PROTECTED]> To: "Soobok Lee" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Tuesday, November 13, 2001 5:36 PM Subject: Re: [idn] reordering strawpoll
> -----BEGIN PGP SIGNED MESSAGE----- > > Soobok Lee wrote: > > From: <[EMAIL PROTECTED]> > > > In a message dated 2001-11-12 14:41:29 Pacific Standard Time, > > > [EMAIL PROTECTED] writes: > > > > > > > If you encode each Hangul syllabic in 3 jamos in utf8, > > > > it need 3 octets * 3 = 9 octets, while 3 basic latin letter need 3 octets > > > > in utf8. 3 times more space! if there were any real "compaction" on > > > > hangul syllable code points, that may be just the bare minimum. > > > > > > But one paragraph earlier, Soobok stated that each hangul character is > > > roughly equivalent to (i.e. carries roughly as much information as) 2.2 to > > > 2.7 Latin letters. So the 9 octets of UTF-8 actually encode the equivalent > > > of 6.6 to 8.1 Latin letters, which means Hangul encoding is 10% to 27% less > > > efficient than Latin encoding. Representing it as two-thirds (67%) less > > > efficient is obviously misleading. Such claims only detract attention away > > > from any merit the reordering plan may have. > > > > my analogy cited above was for *UTF8*. > > However, your argument that it is important to reduce the *average* length > of encoded names, certainly doesn't apply to UTF-8 (even if it's accepted > that it applies to ACE, which I don't accept). Yes, That argument is just about justifying adding hangul syllable code block in addition to hangul jamo (alphabet) block : 9 octets -> 3 octets "compaction". > > Users will never see (much less type in) UTF-8 octet string encodings > except in obscure debugging situations. But, without hangul syllable block, users will suffer from 3 times more resource consumption for a unicode hangul syllable. 6 hangul syllables ( 6 * 3 * 3 = 54 octets ) are allowed within utf8 63 octets limit !!!! That's why hangul jamo block is not enough to encode hangul efficiently. Soobok Lee > The 63-octet limitation on label > length when using UTF-8 appears to be sufficient, AFAICS (even for Indic > scripts, which are less efficient than Hangul in UTF-8). > > - -- > David Hopwood <[EMAIL PROTECTED]> > > Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/ > RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01 > Nothing in this message is intended to be legally binding. If I revoke a > public key but refuse to specify why, it is because the private key has been > seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip > > > -----BEGIN PGP SIGNATURE----- > Version: 2.6.3i > Charset: noconv > > iQEVAwUBO/DbMjkCAxeYt5gVAQGXbwgAouMXtfu/AZi+OBm0R2CwjHc+2UFPMYk+ > qK1GktXy1WDSLlx+EV3brdlHxaQsE51ryfd2eBoHjNpdXujkG44JFgNcqI4UgV6r > fHIfM6zYJqNOZaQlq2o7HmOxr32WKjgtIRwds7src9rXZ6pZGHsx3V1dIyZvg69X > U2lzG7Jh7mEuDjQakrybwi+43ZN/Fb3J7xd7AGT/knPPGh1xZN0mZYDayTbUjMD4 > mIiKYri6DVcNL/mlgx3mIuaCVXPiluxCSZqT8jSCSkovWsmCpzT3e0F2/B6YwxHu > fTilBfNOYjS7njLg+5uhllKxMk8qBlivnlYdPG34AFv26hoRn9yCaw== > =nQjs > -----END PGP SIGNATURE----- >
