Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba

Gregg Reynolds Thu, 22 Dec 2005 20:10:53 -0800

Meor Ridzuan Meor Yahaya wrote:

Mete,
I think your solution is lacking one thing: we can't tell where isalef maksura . Other than that, I don't have any problem. BTW, why is
 it important to have normalization?


Hi again,

As I see it, normalization will make various kinds of text handling(esp. search) easier. For example, if hamza is always encoded as adistinct codepoint (i.e. never use 622/623/624/625/626) then obviouslysearching for hamza is easy. That's good, because the seat of the hamzahas (in general) no semantic significance - it's the hamza that counts.But if you want to search for a particular seat, that's easy too -search for the seat codepoint (627/648/649) followed by hamza. To finda final dotless-yeh-qua-alef, just search for 649 followed by a wordseparator.

My recommendation is to convert all yehs - alef maqsuras, yeh seats
of hamza, yeh seats of small alef, regular yehs, final dotless yehs
- to Farsi yeh. Searching is no problem. Here is the algorithm:

Maybe I'm not understanding Mete, but I don't see how this could work atall. Aside from the semantics I've mentioned in another post, Farsi yehtakes dots in initial and medial forms, no? So how can it be the seatof a hamza or a small alif in those contexts?


-gregg

_______________________________________________
General mailing list
[email protected]
http://lists.arabeyes.org/mailman/listinfo/general

Re: Questions about yeh, hamzah on yeh, alef maksura and dotless ba

رد على