This is all well understood, quite unfortunate but that is how Unicode is currently is, but…
1) The original issue that started this (sub)thread; if I have <U+0622,U+0670> or <U+0627,U+0653,U+0670>, I expect the small Alef to be above the Madda, placing the Madda above the small Alef will give the sequence a totally new meaning, this is unacceptable IMHO. 2) The fact that U+0622 is canonically equivalent to <U+0627,U+0653> pretty much rules out the ability to use of U+0653 as a vowel mark, no other vowel mark in the Arabic block exhibit such a normalization behaviour. This shouldn’t prevent some Arabic-script using language from using it as a modifier, as I would expect it then to still behave as an MCM mark. Regards, Khaled On Fri, Oct 18, 2013 at 04:33:01PM -0700, Roozbeh Pournader wrote: > Let me try to approach the problem from another angle. > > Unicode, although originally planned to be more semantic, has become more > and more a graphical encoding. This can be evidenced by the new characters > encoded or not encoded. The UTC continuously refers people to use existing > code points for things that are graphically similar to already-encoded > characters but are semantically very different, but encodes new characters > that are semantically the same as existing characters, but their exact > visual representation is important and is based on rules that are very hard > to derive. > > This is inevitable to some degree, since text rendering technology and > fonts should not be expected to be very complex. So plain text > representation becomes more visual in order to make life easier for the > rendering engines. > > This can be evidenced by a lot of the newer characters in the Arabic > blocks. The open tanweens or arrowheads in the Arabic Extended-A block were > encoded because they were graphically different, while the committee did > not encode a "waw with madda above" and recommended "waw+madda above" to be > used for it instead. The diacritical hamza was the most controversial, and > the controversy is the main reason for the hole at U+08A1 (it is reserved > for a Beh With Hamza Above, which will be in Unicode 7.0). > > All in all, this means that UTC considers anything that very much looks > like U+0653 a madda above, and anything that may need to be visually > distinguished from it and be smaller in size a small high madda. The glyphs > used in the chart show a significant size difference, and has been showing > that difference since the small high madda got encoded in Unicode 2.0. > Unicode actually doesn't prescribe exact usage of a lot of the Koranic > marks, because the marks may be used very differently across the various > Koranic traditions from Indonesia to Morocco. > > I don't think it's a good idea to consider madda to be a certain kind of > hamza. Yes, in the modern Arabic language Alef+madda above is semantically > equivalent to hamza+alef or alef+alef, but there is no hint of a hamza > semantic when some minority languages using the Arabic script takes a madda > and puts it over a waw to get a new vowel. > > I understand that means that there may be no "real" semantic difference > between a normal madda and a small high madda, but there's really no > semantic difference between a yeh and a farsi yeh either, and they are > separately encoded. Unicode is quite graphical in its encoding. > > Regarding U+06C7 and U+06C8, the UTC has agreed to not encode such > characters anymore, except for the use of hamza above for diacritic usages > of non-hamza semantics. So there may as well be future siblings for U+0681, > U+076C, U+08A1, and U+08A8, but no future siblings to U+06C7 and U+06C8. > > Please tell me if there's anything I've missed to address. > > > On Fri, Oct 18, 2013 at 3:18 PM, Khaled Hosny <[email protected]> wrote: > > > On Fri, Oct 18, 2013 at 02:57:43PM -0700, Roozbeh Pournader wrote: > > > Khaled, you are referring to a specific style of writing the Koran. There > > > are several others, which Unicode should be able to represent. > > > > I’m not sure I follow here, if you think there should be a way to > > differentiate between two forms of prolongation mark (aka Quranic > > Madda), something I have never seen but i’m open to learn something new, > > then a new code point should be encoded, instead of abusing a Hamza (aka > > the other Madda) that has an incompatible normalization behaviour in > > Unicode. > > > > And you ignored my other point. > > > > Regards, > > Khaled > > > > > On Fri, Oct 18, 2013 at 2:47 PM, Khaled Hosny <[email protected]> > > wrote: > > > > > > > On Fri, Oct 18, 2013 at 02:26:15PM -0700, Roozbeh Pournader wrote: > > > > > On Fri, Oct 18, 2013 at 2:23 PM, Khaled Hosny <[email protected] > > > > > > > wrote: > > > > > > > > > > > Furthermore, <alef,quranic madda> ≠ <alef with madda above> > > > > > > > > > > > > > > > > Why? > > > > > > > > Because every Mushaf printed in Egypt (and most of the Arabic world) > > > > since 1919[1] has a note at the end of Madda description stating that > > “… > > > > and this mark should not be used to indicate an omitted Alef after[sic] > > > > a written Alef, as in آمنوا, that were mistakingly put in many > > > > Mushafs …”, which to me is a very frank indication that the two marks > > > > are not the same thing. > > > > > > > > Also a vowel mark (which the Quranic Madda is) should not “blend” with > > > > its base letter, the same way that U+06C7 is not canonically equivalent > > > > to <U+0648,U+064F> etc. > > > > > > > > Regards, > > > > Khaled > > > > > > > > 1. The date of first Mushaf printed by Al-Azhar where most of the > > > > Quranic annotation marks were formalized and standardized. > > > > > > _______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
