Mete Kural wrote:
From: Gregg Reynolds <[EMAIL PROTECTED]> Now, IMO a difficult design
question is whether some true morphemes should in fact be encoded.
Obvious examples: definite article, other particles like laa,
sawfa, sa-, direct object suffixes -hu, -ha, etc. Unicode will
never countenance something like that, but that doesn't mean we
shouldn't. Such design decisions should be made strictly on a
costs/benefits basis, IMO.
I'd like to restate my opinion here that such morphemic encoding is
better done at the markup level. So basically encode the characters
on the basis of a graphemic encoding using Unicode and then further
encode the morphemes on the markup level using an appropriate XML
schema.
Understood. No argument from me on that point. Well, I might dispute
"better"; and we can probably have a discussion about just what is and
isn't a morpheme codepoint. As for Unicode, it would be great if they
would do the right thing; I just happen to think the design principles
of Unicode are inhospitable to some notions of character semantics that
would be very beneficial for Arabic. So I just don't think Unicode will
ever encode some of the things I'd like to see encoded. Doesn't mean
Unicode isn't useful.
I guess what I'm suggesting is an intellectual exercise in encoding
design. Do the cost/benefit analysis for any given codepoint; then e.g.
encoding <negative-particle-laa> doesn't look so bad. The more
information you can pack into the encoding, the less money you have to
spend on higher-level software, and the more you can do with
non-specialized software like grep. I'm not saying at this point that
we *should* encode such morphemes; only that it is worth evaluating in
neutral, quantifiable terms.
Please take a look at what OSIS (www.bibletechnologies.com)
has done. They have already done a lot of this kind of morpheme-based
encoding at the markup level.
Thanks. I believe the Text Encoding Initiative has a bunch of stuff
like that too.
-g
_______________________________________________
General mailing list
[email protected]
http://lists.arabeyes.org/mailman/listinfo/general