Meor Ridzuan Meor Yahaya wrote:
Mete,
I think your solution is lacking one thing: we can't tell where is
alef maksura . Other than that, I don't have any problem. BTW, why is
it important to have normalization?
Hi again,
As I see it, normalization will make various kinds of text handling
(esp. search) easier. For example, if hamza is always encoded as a
distinct codepoint (i.e. never use 622/623/624/625/626) then obviously
searching for hamza is easy. That's good, because the seat of the hamza
has (in general) no semantic significance - it's the hamza that counts.
But if you want to search for a particular seat, that's easy too -
search for the seat codepoint (627/648/649) followed by hamza. To find
a final dotless-yeh-qua-alef, just search for 649 followed by a word
separator.
My recommendation is to convert all yehs - alef maqsuras, yeh seats
of hamza, yeh seats of small alef, regular yehs, final dotless yehs
- to Farsi yeh. Searching is no problem. Here is the algorithm:
Maybe I'm not understanding Mete, but I don't see how this could work at
all. Aside from the semantics I've mentioned in another post, Farsi yeh
takes dots in initial and medial forms, no? So how can it be the seat
of a hamza or a small alif in those contexts?
-gregg
_______________________________________________
General mailing list
[email protected]
http://lists.arabeyes.org/mailman/listinfo/general