On Saturday 26 June 2004 10:19, Mete Kural wrote: > Salaam Mohammed, > > > I would please ask you to at least read the point > > about Normalization of > > the Qur'an text in this post carefully and comment > > on it. > > I can think of alternative ways of doing the > normalization you referred to. A more intelligent > algorithm could detect alef_maksura+superscript_alif > and other similar sequences and normalize them > accordingly.
This won't work, since it's not possible to detect a superscript alef because it's a vowel sign and can exist on top of ANY letter not just alef_maksura, there are not any pre-defined letters/sequences that a superscript alef can only be attached to, it can attached to anything and of course this is not specific to the Qur'an so you can't just hard-code sequences because the same sequence can be used as a superscript alef or as a small alef (the superscript alef is not specific to the Qur'an at all). Hence, it would require the normalization algorithm to know where the exact locations of superscript alefs in the Qur'an and it wouldn't be usable for anything else For example, if a document quotes a verse from the Qur'an and that document needs to be normalized for spellchecking and it contains the misspelled non-Quranic word: ØØØÙØØ (It's considered misspelled because a small alef cannot be used here, and thus it's a superscript alef and hence there is a missing alef here) yet, it have some verses of the Qur'an and one of them has the correctly spelled word: ØØØÙØØ (It's considered correctly spelled because in Qur'anic texts, small alef may be used) After normalizing the two words they become the same word: ØØØØØØ And the spellchecker reports that there are no spelling mistakes although there is one in the first word. Let alone shaping and rendering problems, for example how can you describe that only character in ArabicShaping.txt. It's stated as a transparent character that has no effect in the shaping process and this is correct for the superscript alef. But for the Small Alef, it has effects in the shaping process and cannot be considered transparent. And will you put a dotted circle below it in the code chart or not? Adding such a circle is okay for the superscript alef but what about small alef which is a base character not a NSM? > Such intelligent algorithms are necessary > in Arabic Quran searching anyways; for instance in > order to detect instances of the word "Allah" in > various grammatical contexts without including words > such as "allahumma" (refer to Abdulbaki's Quran > index). > No, they are not really necessary. Any general algorithm should be able to search in the Qur'an (for the example you gave, it's really really simple because the first four letters of allahumma is the word Allah). I agree that advanced searching options requires a dedicated search engine but for something as simple as differentiating between a letter and a vowel sign, it would be completely wrong to rely on special algorithms that are specific to the Qur'an and that are not going to be implemented in a simple text editor (that is, A simple text editor must be able to differentiate between letters and vowel signs without requiring special algorithms for every book out there). A text editor must be able to recognize a letter from its properties in Unicode and must recognize a vowel sign from its properties in Unicode not by hard-coding sequences and such (which will still not work anyway). > We have sent each other tens of emails already > regarding this superscript alef issue and we do not > seem to agree on it. If I had the time right now I > could discuss this dagger alef issue with you further, > but unfortunately I do not. That is why I am not able > to respond to the points you make in your last email. > Insha'Allah in time our differences may resolve. You should be able to recognize that this is not a problem between both of us, I'm not saying my opinions here, I'm stating facts. But since I don't like to be the one who is holding back, I keep discussing facts with you and I keep noting the various reasons why your suggestions won't work. However, I don't think that Unicode would encode an Arabic letter and a vowel sign using the same codepoint even if your solutions were working. This is not logical at all, vowel signs have nothing to do with Arabic letters. BTW: No need for that "dagger" attitude. > But > at this time I will suggest you that if you wish to > submit a proposal for a new dagger alef codepoint > please do it as a seperate proposal. This way we > jointly propose on items that we agree on, and submit > seperate proposals for items that we do not agree on. > It is better than seperately submitting two different > proposals. > > If you wish to get familiar with the process of > submitting a proposal to Unicode, we can submit the > joint proposal first so that you gain that experience. > And afterwards if you wish you can submit your dagger > alef proposal since you will already know the details > of submitting a proposal. > > As I am telling you, unfortunately I won't be able to > allocate more time in the near future for discussing > the dagger alef issue with you. We may return to it at > a later date when I have more time. > > Please respond to us regarding your decision. > > Kind regards, > Mete Do you think that I'm not busy? We are all busy and time can always be allocated but for something as important as the Qur'an, I'm willing to sacrifice anything else (including a job). Anyway, this is not applicable here since we have plenty of time and no need to rush. We will wait for you until you have some time to allocate for this extremely important issue. I think also that this will leave some time for all of us. -- Mohammed Yousif Egypt _______________________________________________ General mailing list [EMAIL PROTECTED] http://lists.arabeyes.org/mailman/listinfo/general

