Thomas Milo wrote:
Meor's luadable effort has helped me to return to my original position:
encode graphemes, not glyphs. Keep the tanween graphemically intact, this
will improve searchability. So I recently changed my position regarding
tanween according to the following formula, that I hope this community will
endorse:
tanween = <vowel> <vowel> + [optional] <modifier>
<vowel>= fatha / dhamma / kasra
<modifier>= tamweem / sequentializer
For backward compatibility,
<vowel> <vowel> = fathatan / dhammatan / kasratan
Hmm. In my opinion, it would be both more useful and more accurate
historically to simply have a couple of TANWEEN codepoints. If I'm not
mistaken, tanween was originally marked using a small nuun and later
evolved into the doubled vowel mark.
For example, using latin-1:
TANWEEN = ñ
TANWEEN IDGHAM = Ñ
TAMWEEM = %
Examples (x = kha, ç = sheen, ² = shadda):
kitaabuñ
xuçubuÑ m²usan²ada#uÑ
min% [EMAIL PROTECTED]
Now search and sort works much better, and the rendering isn't all that
hairy. Edit logic should also be simpler.
I wouldn't advise equating pairs of vowel marks with tanween marks at
the level of encoding design.
-gregg
_______________________________________________
General mailing list
[email protected]
http://lists.arabeyes.org/mailman/listinfo/general