Abdulhaq Lynch wrote: >> On Saturday 25 June 2005 12:42, Thomas Milo wrote: >>> Unicode wants to encode writing systems, not conventions within a >>> writing system nor graphic variantions for the same abstract units >>> of writing that deal with a particular document. >>> >>> In the case of Mushafs, this means that if the same orthographic >>> unit (grapheme) varies in form between Mushafs, but not in >>> function. E.g. various instances of regional tamween forms that all >>> boil down to the exact same thing), propose to encode the >>> abstraction, do not bother them with calligraphic/typographic >>> idosyncracies. By the same token, do not encode ras khaa, when it >>> is a sukun (this one slipped through the net because nobody knew why >>> it was there). As a first step in digitization we should reduce all >>> the units of script to their abstract essence and define their >>> various appearances as regional variations/traditions that can be >>> dealt with by font technology and text mark-up. >>> >> >> Makes sense. >> >> What do you think of my example of the pakistani tanween with small >> meem, indicating tanween + iqlaab, which from the grapheme point of >> view is in addition to and offset from the tanween? >> (http://kprayertime.sourceforge.net/calligraphy/tanween-dammataan-iqlaab.png ) >> >> Doesn't this indicate that iqlaab should be encoded as such, and not >> incorporated into the tanween?
Well, in my view this is an example of how not to identify graphemes. The Egyptian and Saudi editions express iqlaab with a ligature of vowel and small meem, your example shows a tanween ligature with small meem, but the underlying grapheme is identical: tanween+iqlaab. The first thing to agree on is to encode iqlaab as a separate grapheme. What rests then is how to encode tanween. Unicode adopted the tanween ligatures as separate codes. My opinion is that the ligatures fathatan, dhammatan and kasratan are not graphemes, but ligatures consisting of exactly what their Arabic names indicate: two fathas, two dhammas and two kasras. Now there was the authoritative source that claims there was originally a single vowel followed by a small or big noon. I consulted another authoritative source, Dr Gerd-RĂ¼diger Puin, researcher into the history of Qur'anic orthography, and he confirmed my observation that the oldest manuscripts express tanween with two, horizontally aligned vowel signs. This is also how Yasin Dutton describes them - no trace of a small noon, let alone a big one. Yet, as a logical device, I like the elegance of the formula: [vowel <a/u/i>]+[any tanween <regular/iqlaab/idgaam> (as many as you can identify; these are the three expressed in the Saudi orthography, AFAIK) This structure guarantees searching in existing Unicode-enabled environments. It also guarantees that modern font technology can take care of the shapes, whether Pakistani , Egyptian, Saudi or North African. This approach would mean that on the level of plain text code, Qur'ans remain identical when they do not conceptually differ and it would make research into real differences much more efficient. A simple canonical equivalence insures legacy compatibility with existing fathatan, dhammatan and kasratan. t
_______________________________________________ General mailing list [email protected] http://lists.arabeyes.org/mailman/listinfo/general

