Hello Meor, ----- Original Message ---- <<From: Meor Ridzuan Meor Yahaya <[EMAIL PROTECTED]> 1. The usage of 649 in it's final form should always represent alef maqsura, so we can easily look for it. For other dotless final yeh, we can use farsi yeh for it, or even 64A, with a Locale system attach to it. But for now, maybe we can keep it as Farsi yeh.>>
I suggest not to use 649 since it is an unnecessary character - Farsi yeh covers it. IMHO it should not have entered Unicode in the first place, but it was probably carried over to Unicode from legacy ISO Arabic encoding. (and hopefully the name of Farsi yeh can be changed such that it is Farsi and Classical Arabic - and possibly more - yeh). <<2. 626 should be used. This will make it easier and more understandable, because we know what 626 is. If we encode it as 649 + hamza above/below, someone might mistakenly think the 649 is alef maksura, which in this case, definately not.>> I strongly suggest not to use 626 but rather use the seperate hamza above/below codepoint. This is better normalization of text. Besides you have to use a seperate small alef anyways. So use both a seperate hamza above/below and a seperate small alef for consistency. Did I tell you this was better for normalization? :) <<3. Now, we are left with dotless yeh with small alef in the initial and medial form. From previous mail, the suggestion was to use 649 + 670. Of course, visually, it is easy to tell that this is not alef maksura, but rather a dotless yeh serve as the chair for small alef. However, to develop an algorithm to search for it, it is not as easy/straight forward. I think that is why someone was sugesting to me to use dotless ba instead of 649. Any suggestion?>> Dotless beh is a non-starter for this purpose. It is what it is; it is a dotless "beh". It is intended for an archaic ambigious beh/teh/theh/yeh character. The seat of small alef is not an ambigious character, it is dotless but it is a "yeh". The algorithm for searching a small alef with dotless yeh chair is simply searching for the code sequence yeh+superscript_alef. <<Anyway, to make no 1 happen, I need to have some word list initially so that I can look for the word, and make the necessary changes. First, I probaly change all final yeh (of course, all are dotless) to farsi yeh ATM, then change the necessary word to use 649. After that being done, maybe all occurance of yeh can/should be change to Farsi yeh, just to make it consistent. For no 2, should not be a problem for me to change all. Just need to work on no 3. Maybe at the moment, I can go ahead with dotless ba. Later, if someone can come up with a better solution, I can change it back. This will be easy because there is no other use of dotless ba anywhere.>> My recommendation is to convert all yehs - alef maqsuras, yeh seats of hamza, yeh seats of small alef, regular yehs, final dotless yehs - to Farsi yeh. Searching is no problem. Here is the algorithm: If you're looking for alef maqsura or more properly a final dotless yeh that is pronounced like alef, look for: fatha+farsi_yeh at the end of word, fathatah+farsi_yeh at the end of word, farsi_yeh+superscript_alef at the end of word If you're looking for a yeh seat of hamza, look for: farsi_yeh+hamza_above, farsi_yeh+hamza_below If you're looking for a yeh seat of small alef, look for: farsi_yeh+small_alef If you're looking for final dotless yehs, look for: farsi_yeh at the end of word It should be pretty straight-forward, thanks to the immensely vocalized Fahd/Madinah Mushaf. Regards, Mete
_______________________________________________ General mailing list [email protected] http://lists.arabeyes.org/mailman/listinfo/general

