Gregg Reynolds wrote: > Thomas Milo wrote: >> Gregg Reynolds wrote: >> The qur'anic assimilation of second one is not yet supported, but it >> will read like this: >> khushubu- m:usannadätu-n (DMG: ḫušubu- m:usannadätu-n) >> As you can see, initial compensatory shaddä is treated differently >> from morphological shaddä. > > Yes; this is an example where a very useful codepoint is unlikely to > be endorsed by unicode. We could use two shaddas, one phonotactic > and one lexical. I think there might even be a third case but I > can't think of it at the moment.
I was not suggesting this as a potential codepoint. I see no graphemic difference between either use of shadda. My reversible trabscription algorithm inserts alif-wasla before any initial consonant cluster, incluting the [mm-] of /m:usannadätu-n/. Consequently, this connecting cluster must be marked in a different way, so I borrowed a conventional sign that also happened to be ASCII (another constraint). I added a comment because I knew it would intrigue you. What I sense from our discussions, is that your are including the morpho-phonological level of analysis in the discussion, whereas I try to stick to a script-oriented graphemic level. Both are abstract and very distinct from the tendency of the Unicode group to encode conventions that originated from within the graphic industry without any particular discipline in analysis. Yet the Unicode standard has the explicit ambition to encode plain text, which I interpret as trying to encode a script in graphemic units: in minimal distinctive functional units of a given writing system, not in linguistic units or elements of given type case. >> What's the objection? It would be just as transparent as you >> solution. > > I have to think some more about the paired vowels idea. > >> Anyway, I like your approach. If it is to find any acceptance, there >> needs to be canonical equivalence with legacy encoding accoding to >> this formula: >> >> TANWEEN = <vowel><small noon> >> = conventional tanween >> TAMWEEM = <vowel><small meem> >> IDGHAM = <vowel><idgham >> code> >> > > But I wouldn't call it <small noon>; we want to retain the semantics > of tanween explicitly in the encoding element so that software > doesn't have to infer tanween based on two codepoints. This is the > kind of thing I mean when I say intelligence should be migrated from > software to the encoding as much as possible. I just used a distinctive name in plain language. Obviously my preferred name for this code point would be ARABIC TANWEEN MARKER, along with ARABIC TAMWEEM MARKER and ARABIC IDGHAM MARKER I agree that by calling it SMALL NOON it could be confused with the existing one-off small nuun code used fir completing the word /nanjii/ (off the top of my head). t _______________________________________________ General mailing list [email protected] http://lists.arabeyes.org/mailman/listinfo/general

