Re: Unicode: endpoint of evolution of encodings?

Christopher Fynn Fri, 19 Nov 2004 04:41:24 -0800

hi srintuar

Since this is using "smart" font technology the underlying data characters don't change - though one script (eg. Cyrillic) is transliterated into another (Latin). All the rules for doing this are built into the font. The lookups would need have to be keyed to specific languages, since such transliteration rules would undoubtedly be different from one language to another.

This "transliteration" happens only when the text is displayed leaving the data characters don't change so it causes no loss of information. This feature is only in AAT /ATSUI font format spec, though you could do the same with Graphite, since Graphite allows you to define your own features.Something similar could probably be done with OpenType though it would involve changes to the shaping engine (Uniscribe, Pango, etc) as well as adding tables to the font. You'd also need to get the feature registered.


srintuar wrote:

Christopher Fynn wrote:
The Transliteration feature types allows text is one format to be displayed using another format. An example is taking a hiragana string and displaying it as katakana. This is an exclusive feature type.
    Currently defined selectors for this feature are:
          o Hiragana to Katakana
          o Katakana to Hiragana
          o Kana to Romanization
          o Romanization to Hiragana
          o Romanization to Katakana

There is no one "right" way to perform these projections.
Also, they are not necessarily reflexive. (meaning they
lose information- you couldnt recover the original text
from the transformed text in some cases)


You'd have to encode all the rules for SerbianCyrillic to
SerbianLatin transliteration into the font. This only
results in glyph (display) transformations from one script
to another, leaving the underlying data characters remain
untouched so there is no information loss.

There is no way you could encode such information into a
font face itself by displaying alternate glyphs. Also, you
would not be able to unify Hiragana and Ro-maji pairs into
single codepoints. (ro-maji are context sensitive, for one
thing)


The transliteration rules for these transformations can be
context sensitive (like any other AAT / OpenType / Graphite
shaping or positioning  feature). Other contextual shaping
and positioning features could be used in conjunction with


If you want more information look at:
  <http://developer.apple.com/fonts/Registry/index.html>
  <http://developer.apple.com/fonts/TTRefMan/RM06/Chap6feat.html>
  <http://developer.apple.com/fonts/TTRefMan/RM06/Chap6mort.html>
   http://developer.apple.com/fonts/TTRefMan/RM06/Chap6Tables.html

ICU also has a Transliterator class
  <http://oss.software.ibm.com/icu/apiref/classTransliterator.html>
<http://oss.software.ibm.com/icu4j/doc/com/ibm/icu/text/Transliterator.html>

regards

- chris

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Unicode: endpoint of evolution of encodings?

Reply via email to