Korean compression (was: Re: Ternary search trees for Unicode dictionaries)

2003-11-23 Thread Doug Ewell
Jungshik Shin jshin at mailaps dot org wrote: The file they used, called arirang.txt, contains over 3.3 million Unicode characters and was apparently once part of their Florida Tech Corpus of Multi-Lingual Text but subsequently deleted for reasons not known to me. I can supply it if you're

RE: Ternary search trees for Unicode dictionaries

2003-11-23 Thread Philippe Verdy
Jungshik Shin writes* on sun 23-nov-2003 03:51 to Doug Ewell: On Thu, 20 Nov 2003, Doug Ewell wrote: Jungshik Shin jshin at mailaps dot org wrote: The file is all in syllables, not jamos, which I guess means it's in NFC. Yes, it's in NFC, then. The statistics on this file are

Re: How can I have OTF for MacOS

2003-11-23 Thread Christopher John Fynn
Mustafa With complex scripts like Bangla under Mac OSX I think you have to make AAT fonts rather than OT fonts - though it is possible to include both AAT tables and OT tables in the same font. For tools specs to do this try: http://developer.apple.com/fonts/OSXTools.html Christopher J. Fynn

Re: How can I have OTF for MacOS

2003-11-23 Thread mjabbar
Dear Fynn, Thanks for the information. I hope it will help me in dveloping fonts. I have downloaded the Tool. But how can I can create a Keyboard driver for accessing the Fonts? Thanks and regards Mustafa Jabbar Quoting Christopher John Fynn [EMAIL PROTECTED]: Mustafa With complex scripts

Re: How can I have OTF for MacOS

2003-11-23 Thread John Delacour
At 12:27 am +0600 24/11/03, [EMAIL PROTECTED] wrote: But how can I can create a Keyboard driver for accessing the Fonts? Things have developed a bit since I made a keyboard-layout for polytonic Greek but at the time I used Alex Elenberg's generator. Go to http://wordherd.com/keyboards/ JD

Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)

2003-11-23 Thread Doug Ewell
Mark Davis mark dot davis at jtcsv dot com wrote: Of course, no compression format applied to jamos could even do as well as UTF-16 applied to syllables, i.e. 2 bytes per syllable. This needs a bit of qualification. An arithmetic compression would do better, for example, or even just a

RE: Ternary search trees for Unicode dictionaries

2003-11-23 Thread Philippe Verdy
De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] la part de Doug Ewell Envoy : dimanche 23 novembre 2003 22:06 : Unicode Mailing List Cc : [EMAIL PROTECTED]; Jungshik Shin Objet : Re: Ternary search trees for Unicode dictionaries Philippe Verdy verdy underscore p at wanadoo dot fr

Re: Ternary search trees for Unicode dictionaries

2003-11-23 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: OK, this is a transform, but it is still canonically equivalent to the source text. Transformations between canonical equivalent strings is safe (at least for Korean Hangul), and this is what any normalizer performs. But compressors