Re: Proposal for the Basis of a Codepoint Extension to Unicode for the Encoding of the Quranic Manuscripts

Gregg Reynolds Tue, 21 Jun 2005 08:52:04 -0700

Abdulhaq Lynch wrote:

This is a working document to enable a consensus to be established regarding aprivate use area to extend the Unicode arabic specification in order tosupport encoding the quran in a clear, simple and complete way.
This document is not complete but details basic steps for moving forward.


Nice work.  A few suggestions:

a. Ignore Unicode. Focus on the needs of your community. Get thetheory right first and you'll be able to generate proposals for Unicodelater if you think it useful.

b. Focus on semantic categories, not "characters", and don't bias"representation" towards "glyphs" or visual representations in general.For example, your proposal for "ikhfaa" is something that hadn'toccurred to me. If you're only interested in producing a visualrepresentation of text, then arguably it isn't needed. But what if youwant to generate an audio representation? Or if you just want toanalyze the encoded text? Then it seems to be pretty useful.

c. Your proposal rightly diverges from Unicode. So why stop with newspecialized semantic categories? Fix what's broken in Unicode. Forexample, Unicode's idea of tanween is pretty bad, IMO. If I coulddesign it again I would have a single tanween character to be addedafter the vowel signs. The compound hamza "characters" in Unicodeshould be decomposed too, IMO. Textual analysis would be much easierthen. Then of course there's the bidi fallacy in all its ridiculousglory. There are lots of ways to better capture the semantics of Arabictext, but the Unicode bunch is unlikely to ever approve of such an approach.

d. You don't need higher-level grammars like XML. My own opinion isthat primary goal of an encoding design should be to migrateintelligence out of the application and into the text, subject to thesyntactic constraints of a plain text encoding. So long as you can givea clear and concise definition of a particular semantic category, it isa good candidate for encoding as plain text.

I once came across a relevant message from none other than RichardStallman. It was on a list for gcc development, in response to aquestion about conformance to the ISO definition of C. RMS' responsewas simply that standards are merely recommendations, and that the needsof the community take precedence. Which seems very wise to me; Unicodeis so riddled with problems it is bound to be superceded some day, soblindly following it even where it doesn't meet the needs of one'scommunity seems questionnable.


keep up the good work,

-gregg

_______________________________________________
General mailing list
[email protected]
http://lists.arabeyes.org/mailman/listinfo/general

Re: Proposal for the Basis of a Codepoint Extension to Unicode for the Encoding of the Quranic Manuscripts

رد على