Mete Kural wrote:
I suggest not to use 649 since it is an unnecessary character - Farsi
yeh covers it. IMHO it should not have entered Unicode in the first
place, but it was probably carried over to Unicode from legacy ISO
Arabic encoding.
I don't understand the objection, nor the assertion that it is
"unnecessary". Farsi yeh and 064A have exactly the same "orthosemic" ;)
semantics, in my view. Otherwise, you would have to argue that Farsi
yeh has unstable semantics - it means one thing when it has dots, and
when it doesn't have dots, it may mean either of two things, but you
can't know which one from the encoding itself, you have to analyze the
word. That's not good - the objective should be to pack as much info as
possible into the encoding.
Dotless yeh (0649) is absolutely essential - it is the only way to
encode a meaningless (purely surface) dotless yeh, which is an essential
part of Arabic orthography - 0649 as seat of hamza and as final alef.
(Whether that alef is maqsura or not is a matter of grammatical
analysis, beyond the scope of an encoding design.)
(and hopefully the name of Farsi yeh can be changed
such that it is Farsi and Classical Arabic - and possibly more -
yeh).
<<2. 626 should be used. This will make it easier and more
understandable, because we know what 626 is. If we encode it as 649 +
hamza above/below, someone might mistakenly think the 649 is alef
maksura, which in this case, definately not.>>
I strongly suggest not to use 626 but rather use the seperate hamza
above/below codepoint. This is better normalization of text. Besides
you have to use a seperate small alef anyways. So use both a seperate
hamza above/below and a seperate small alef for consistency. Did I
tell you this was better for normalization? :)
Here I agree with you. I don't think there's a risk of confusing
649+hamza with "alef maqsura"; or rather, I think the confusion comes
from Unicode's poor naming of the codepoint. But it we call it dotless
yeh or the like, there can be no confusion.
<<3. Now, we are left with dotless yeh with small alef in the initial
and medial form. From previous mail, the suggestion was to use 649 +
670. Of course, visually, it is easy to tell that this is not alef
maksura, but rather a dotless yeh serve as the chair for small alef.
However, to develop an algorithm to search for it, it is not as
easy/straight forward. I think that is why someone was sugesting to
me to use dotless ba instead of 649. Any suggestion?>>
Dotless beh is a non-starter for this purpose. It is what it is; it
is a dotless "beh". It is intended for an archaic ambigious
Agreed.
-g
_______________________________________________
General mailing list
[email protected]
http://lists.arabeyes.org/mailman/listinfo/general