Hi, Adams and all AMC-ACE-Z + REORDERING is more efficient than DUDE + REORDERING at least for Chinese/Hangul. Testing other scripts, too. For long chinese/hangul domains, the LAMCZ label length approximates to (1.95~2.15) * (number of code points). As for label length efficiency, LAMCZ is 11.30% efficient than LDUDE for chinese 285108 ML.com samples. LAMCZ is 4.29% efficient than LDUDE for hangeul 207207 ML.com samples. LAMCZ is the most efficient one that I have ever tested with. (I have not tested with MACE,ACE37 yet. wait for a while, please.) (I excluded Latin Ranges from REORDERING due to AMCZ's literal mode). Cheers, Soobok Lee ------------------------------------------------------------------------- For Chinese ML.com samples. N: length of a domain label ( # of code points) FREQ: number domains of length N N*FREQ: sum of # of code points of domains of length N SUM OF AMCZ: sum of lengths of AMCZ labels X: SUM OF AMCZ / N * FREQ SUM OF LAMCZ: sum of lengths of LAMCZ labels Y: SUM OF LAMCZ / N * FREQ COMP: (SUM OF LAMCZ - SUM OF AMCZ) / SUM OF AMCZ * 100 | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 4642| 4642| 15804(3.40)| 14807(3.19)|6.31| | 2| 59708| 119416| 401549(3.36)| 352022(2.95)|12.33| | 3| 49471| 148413| 484456(3.26)| 415104(2.80)|14.32| | 4| 99402| 397608| 1269398(3.19)| 1034646(2.60)|18.49| | 5| 29974| 149870| 467070(3.12)| 381651(2.55)|18.29| | 6| 20809| 124854| 384013(3.08)| 304635(2.44)|20.67| | 7| 8860| 62020| 186347(3.00)| 147111(2.37)|21.06| | 8| 5251| 42008| 124325(2.96)| 97303(2.32)|21.73| | 9| 2666| 23994| 69234(2.89)| 54697(2.28)|21.00| | 10| 2008| 20080| 57887(2.88)| 44270(2.20)|23.52| | 11| 859| 9449| 26914(2.85)| 20836(2.21)|22.58| | 12| 671| 8052| 22819(2.83)| 17294(2.15)|24.21| | 13| 346| 4498| 12217(2.72)| 9581(2.13)|21.58| | 14| 235| 3290| 9084(2.76)| 6933(2.11)|23.68| | 15| 117| 1755| 4723(2.69)| 3721(2.12)|21.22| | 16| 68| 1088| 2884(2.65)| 2258(2.08)|21.71| | 17| 21| 357| 911(2.55)| 704(1.97)|22.72| | | 285108| 1121394| 3539635(3.16)| 2907573(2.59)|17.86| For Korean ML.com samples. N: length of a domain label ( # of code points) FREQ: number domains of length N N*FREQ: sum of # of code points of domains of length N SUM OF AMCZ: sum of lengths of AMCZ labels X: SUM OF AMCZ / N * FREQ SUM OF LAMCZ: sum of lengths of LAMCZ labels Y: SUM OF LAMCZ / N * FREQ COMP: (SUM OF LAMCZ - SUM OF AMCZ) / SUM OF AMCZ * 100 | N| FREQ| N*FREQ| SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP| | 1| 1941| 1941| 7764(4.00)| 7764(4.00)|0.00| | 2| 16978| 33956| 123248(3.63)| 105628(3.11)|14.30| | 3| 38852| 116556| 394410(3.38)| 322373(2.77)|18.26| | 4| 61642| 246568| 803121(3.26)| 625970(2.54)|22.06| | 5| 40375| 201875| 639079(3.17)| 483118(2.39)|24.40| | 6| 24561| 147366| 458978(3.11)| 337398(2.29)|26.49| | 7| 13034| 91238| 280346(3.07)| 203406(2.23)|27.44| | 8| 5596| 44768| 136452(3.05)| 97248(2.17)|28.73| | 9| 2421| 21789| 65504(3.01)| 46536(2.14)|28.96| | 10| 1033| 10330| 29964(2.90)| 21330(2.06)|28.81| | 11| 427| 4697| 13845(2.95)| 9739(2.07)|29.66| | 12| 173| 2076| 5905(2.84)| 4261(2.05)|27.84| | 13| 96| 1248| 3588(2.88)| 2539(2.03)|29.24| | 14| 32| 448| 1331(2.97)| 921(2.06)|30.80| | 15| 22| 330| 927(2.81)| 675(2.05)|27.18| | 16| 15| 240| 606(2.52)| 471(1.96)|22.28| | 17| 8| 136| 378(2.78)| 267(1.96)|29.37| | 19| 1| 19| 26(1.37)| 26(1.37)|0.00| | | 207207| 925581| 2965472(3.20)| 2269670(2.45)|23.46| ----- Original Message ----- From: "Soobok Lee" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, July 10, 2001 10:37 PM Subject: chinese/hangul ML.com statistics with DUDE/LDUDE > > The next table is from > 285108 chinese ML.com samples (old raw data from VGRS). > > "COMP" column includes improvement ratios of LDUDE over DUDE. > "Y" column points that for long chinese domains, LDUDE's label > length is close to (2.0~2.5)*(input domain length). > > > N: length of a domain label ( # of code points) > FREQ: number domains of length N > SUM OF DUDE: sum of lengths of DUDE labels > X: SUM OF DUDE / N * FREQ > SUM OF LDUDE: sum of lengths of LDUDE labels > Y: SUM OF LDUDE / N * FREQ > COMP: (SUM OF LDUDE - SUM OF DUDE) / SUM OF DUDE * 100 > > | N| FREQ| N*FREQ| SUM OF DUDE(X)| SUM OF LDUDE(Y)| COMP| > > | 1| 4642| 4642| 18568(4.00)| 18568(4.00)| 0.00| > | 2| 59708| 119416| 462031(3.87)| 415599(3.48)|10.05| > | 3| 49471| 148413| 566440(3.82)| 477649(3.22)|15.68| > | 4| 99402| 397608| 1509929(3.80)| 1168378(2.94)|22.62| > | 5| 29974| 149870| 554237(3.70)| 426226(2.84)|23.10| > | 6| 20809| 124854| 457412(3.66)| 333416(2.67)|27.11| > | 7| 8860| 62020| 220880(3.56)| 160563(2.59)|27.31| > | 8| 5251| 42008| 146822(3.50)| 103903(2.47)|29.23| > | 9| 2666| 23994| 81433(3.39)| 58657(2.44)|27.97| > | 10| 2008| 20080| 68385(3.41)| 46708(2.33)|31.70| > | 11| 859| 9449| 31596(3.34)| 22111(2.34)|30.02| > | 12| 671| 8052| 27039(3.36)| 18135(2.25)|32.93| > | 13| 346| 4498| 14306(3.18)| 10088(2.24)|29.48| > | 14| 235| 3290| 10676(3.24)| 7230(2.20)|32.28| > | 15| 117| 1755| 5568(3.17)| 3854(2.20)|30.78| > | 16| 68| 1088| 3383(3.11)| 2376(2.18)|29.77| > | 17| 21| 357| 1075(3.01)| 750(2.10)|30.23| > > | | 285108| 1121394| 4179780(3.73)| 3274211(2.92)|21.67| > > > > The next table is > From 207207 hangul ML.com samples (old raw data from VGRS). > > | N| FREQ| N*FREQ| SUM OF DUDE(X)| SUM OF LDUDE(Y)| COMP| > > | 1| 1941| 1941| 7764(4.00)| 7764(4.00)|0.00| > | 2| 16978| 33956| 129239(3.81)| 111308(3.28)|13.87| > | 3| 38852| 116556| 436845(3.75)| 341333(2.93)|21.86| > | 4| 61642| 246568| 915355(3.71)| 653736(2.65)|28.58| > | 5| 40375| 201875| 743090(3.68)| 502097(2.49)|32.43| > | 6| 24561| 147366| 540245(3.67)| 349710(2.37)|35.27| > | 7| 13034| 91238| 332964(3.65)| 211206(2.31)|36.57| > | 8| 5596| 44768| 162833(3.64)| 100618(2.25)|38.21| > | 9| 2421| 21789| 78945(3.62)| 48633(2.23)|38.40| > | 10| 1033| 10330| 36144(3.50)| 22323(2.16)|38.24| > | 11| 427| 4697| 16744(3.56)| 10259(2.18)|38.73| > | 12| 173| 2076| 7178(3.46)| 4578(2.21)|36.22| > | 13| 96| 1248| 4386(3.51)| 2725(2.18)|37.87| > | 14| 32| 448| 1656(3.70)| 1006(2.25)|39.25| > | 15| 22| 330| 1168(3.54)| 750(2.27)|35.79| > | 16| 15| 240| 757(3.15)| 529(2.20)|30.12| > | 17| 8| 136| 470(3.46)| 299(2.20)|36.38| > | 19| 1| 19| 30(1.58)| 30(1.58)|0.00| > > | | 207207| 925581| 3415813(3.69)| 2368904(2.56)|30.65| > > > > > >
