Abdulhaq Lynch wrote:
This is a working document to enable a consensus to be established regarding a
private use area to extend the Unicode arabic specification in order to
support encoding the quran in a clear, simple and complete way.
This document is not complete but details basic steps for moving forward.
Nice work. A few suggestions:
a. Ignore Unicode. Focus on the needs of your community. Get the
theory right first and you'll be able to generate proposals for Unicode
later if you think it useful.
b. Focus on semantic categories, not "characters", and don't bias
"representation" towards "glyphs" or visual representations in general.
For example, your proposal for "ikhfaa" is something that hadn't
occurred to me. If you're only interested in producing a visual
representation of text, then arguably it isn't needed. But what if you
want to generate an audio representation? Or if you just want to
analyze the encoded text? Then it seems to be pretty useful.
c. Your proposal rightly diverges from Unicode. So why stop with new
specialized semantic categories? Fix what's broken in Unicode. For
example, Unicode's idea of tanween is pretty bad, IMO. If I could
design it again I would have a single tanween character to be added
after the vowel signs. The compound hamza "characters" in Unicode
should be decomposed too, IMO. Textual analysis would be much easier
then. Then of course there's the bidi fallacy in all its ridiculous
glory. There are lots of ways to better capture the semantics of Arabic
text, but the Unicode bunch is unlikely to ever approve of such an approach.
d. You don't need higher-level grammars like XML. My own opinion is
that primary goal of an encoding design should be to migrate
intelligence out of the application and into the text, subject to the
syntactic constraints of a plain text encoding. So long as you can give
a clear and concise definition of a particular semantic category, it is
a good candidate for encoding as plain text.
I once came across a relevant message from none other than Richard
Stallman. It was on a list for gcc development, in response to a
question about conformance to the ISO definition of C. RMS' response
was simply that standards are merely recommendations, and that the needs
of the community take precedence. Which seems very wise to me; Unicode
is so riddled with problems it is bound to be superceded some day, so
blindly following it even where it doesn't meet the needs of one's
community seems questionnable.
keep up the good work,
-gregg
_______________________________________________
General mailing list
[email protected]
http://lists.arabeyes.org/mailman/listinfo/general