-------- Original Message -------- A new version of the *Crescent Quran Corpus* is now freely available online at http://quran.uk.net. The corpus contains both morphological and syntactic annotation of the Quran in Arabic. Previous releases of the corpus focused on the morphology of Classical Arabic, but this new release now includes an in-progress syntactic treebank of the Quran. Some new features of this release of the corpus include:
(1) *Natural Language Generation *(NLG) has been applied to provide summaries in English of the morphology of each Arabic word of the Quran. For example: *The fourth word of verse (21:70) is divided into 4 morphological segments. A conjunction, verb, subject pronoun and object pronoun. The prefixed conjunction fa is usually translated as "then" or "so". The perfect verb (fi3il mad) is first person masculine plural. The verb's root is jim 3ayn lam (j 3 l). The attached object pronoun is third person masculine plural.* See http://quran.uk.net/TokenDetail.aspx?location=(21:70:4) (2) *Syntactic Treebank*. Syntactic annotation of the Quran has been expanded, using a hybrid dependency / constituency framework, following traditional Arabic grammar (i'3raab). Syntactic annotation is now available for chapters 67 to 114. See http://quran.uk.net/Treebank.aspx. Morphological annotation for all of the Quran with part-of-speech tagging has been reviewed and improved. (3) *Quran Java API*. A Quran Java API for the text of the corpus has been integrated into the website, and is freely available for download. (4) *Grammar Documentation and Annotation Guidelines*. The website now includes a comprehensive set of documentation on Arabic dependency grammar which also serves as set of guidelines for corpus annotators. (5) *Audio Improvements*. A selection of 10 choices for audio, including an audio English translation of the text for each verse in the corpus. (6) *Arabic/English Lexicon of the Quran*. Now includes root counts for each lexicon entry. (7) *Improved Visualization*. The website provides improved visualization for 700 dependency graphs, with better website layout and navigation. ---------------------------------------------------------------------- *Interested in becoming a volunteer annotator?* We are currently looking for native Arabic speakers to assist in corpus annotation, and in particular syntactic annotation. The Crescent corpus is an open source community project with the aim of producing accurate multi-level annotation of the Quran in classical Arabic, including morphological and syntactic annotation. The framework adpoted for syntactic annotation is that of traditional Arabic dependency grammar (i'3raab). For more information on the corpus please contact the main project researcher. Kais Dukes, School of Computing University of Leeds United Kingdom Soraya Zaidi, ([email protected]) URL: http://sites.google.com/site/sorayazaidi GRIA(Groupe de Recherche en Intelligence Artificielle) LRI(Laboratoire de recherche en Informatique) (http://www.lri-annaba.net/) Université Badji Mokhtar Annaba (http://www.univ-annaba.org/) Algerie. --~--~---------~--~----~------------~-------~--~----~ لقد تلقيت هذه الرسالة لأنك مشترك في مجموعات Google مجموعة "مشروع الترميز المعنوي للغة العربية". لإرسال هذا إلى هذه المجموعة، قم بإرسال بريد إلكتروني إلى [email protected] لإلغاء الاشتراك في هذه المجموعة، ابعث برسالة إلكترونية إلى [email protected] لخيارات أكثر، الرجاء زيارة المجموعة على http://groups.google.com/group/avrst?hl=ar -~----------~----~----~----~------~----~------~--~--- _______________________________________________ Doc mailing list [email protected] http://lists.arabeyes.org/mailman/listinfo/doc

