Apologies for cross-posting.

The Quranic Arabic Corpus (http://corpus.quran.com) is an international
collaborative linguistic project initiated at the University of Leeds, that
aims to bridge the gap between the traditional
Arabic grammar of i'rab and techniques from modern computational
linguistics. This open source resource includes part-of-speech tagging for
the Quran, morphological segmentation and a formal representation of Quranic
syntax using dependency graphs. Version 0.4 of the corpus provides several
improvements over the previous release:

*** [Increased coverage for the syntactic treebank]. Version 0.4 of the
treebank covers 40% of the Quran by word count (30,895 out of 77,429 words).
The treebank provides syntactic annotation using dependency grammar for
chapters 1-8 and 59-114 of the Quran.

*** [Revised morphological analysis]. Following online collaboration by
volunteer annotators, over 500 suggestions have cross-checked against
traditional sources of Arabic grammar, resulting in more accurate
morphological tagging.

*** [Improved Quran dictionary and lemmatization]. The list of roots and
lemmas that group related derived words has been made more consistent with
traditional Arabic lexicons. The online Quran dictionary now also includes
concordance lines from Quranic verses as context.

*** [Readability and navigation improvements]. The content of the website
has been better organized, with improvements to navigation and layout.
Several typing mistakes and omissions have been corrected in the word by
word interlinear translation into English.

*** [More accurate tagging of proper nouns]. Eight new named entities have
been added to the semantic ontology that were previously tagged only as
nouns: Al-Ahqaf, Al-Jahiliyah, Al-Jumu'ah, Baal, Magians, Salsabil, Sirius,
and Zaqqum.

*** [More accurate tagging for particles waw and fa]. In accordance with
traditional Arabic grammar, for certain words, the particle fa is now tagged
as a supplemental particle (harf za'id), such as in the combination
a-fa-man.

*** [Version 0.4 of the morphologically annotated corpus] is freely
available for download from the Quranic Arabic Corpus website.

The Quranic Arabic Corpus is an open source project. Contributions or
questions about the research are more than welcome. Please direct any
correspondence to Kais Dukes, PhD researcher at the School of Computing,
University of Leeds:

web: www.kaisdukes.com
e-mail: [email protected]

Reply via email to