Assalamu Alaykum wa Rahmatallah wa Barakatu

Great to hear from you Fatma. I hope you don't mind, I am sending the reply to 
your question below to the comp-quran discussion list. It may be useful to note 
the approach we have followed so far, for other future volunteer annotators on 
the project. Thanks again for joining the computational Quranic project, and 
also for willing to review some of the annotation in the corpus. It is through 
the many volunteers constantly reviewing the text that this project is steadily 
progressing towards one of its primary goals - a highly accurate annotated 
tagged Arabic Quran. The project follows the methodology of open on-line 
collaborative annotation, which as you point out requires some way to resolve 
differences of opinion.

You raise a very good question below - what do to in situations where there 
could be more than one tag for each word, or possibly different analyses?

In general, given enough time I hope that at a later stage we can consider 
tagging the corpus with different points of view for these rare more difficult 
cases. However, for the vast majority of words most annotators are usually in 
full agreement So far, inter-annotator agreement has been quite high on the 
part-of-speech and morphological tagging projects. It would appear that so far 
the issues are more to do with correcting mistakes that may still remain from 
the initial automatic tagging of the Quran through algorithmic means. It is 
usually rare that two annotators will disagree on the correct grammatical 
analysis for a word in the Quran. However, for the more interesting cases, I 
would consider discussing these on the open message board on the corpus website:

http://corpus.quran.com/messageboard.jsp

The message board is also a good place to get a feel for what annotation 
involves, and is also a chance to see what the more common issues are with when 
considering part-of-speech tagging for the Quran. When using the message board, 
you will get to engage with many different volunteer annotators, who are also 
all looking at the tags in detail, and may be able to shed new light or a 
different point of view on the tagging for a specific word. It is also 
interesting to note that there now seems to be more or less a general consensus 
on the message board, and among annotators, to adopt the following text as a 
primary reference:

http://www.archive.org/download/imkam12

This is mostly because this reference is quite detailed, and has proven through 
experience to be fairly consistent in its analysis. I would suggest that you 
might want to download the required volume from the set of 12 PDF books from 
the above link. These form a very comprehensive and detailed syntactic and 
morphological analysis for the Quran, written in Arabic, that covers each word 
for each verse in a fair amount of detail with regards to its syntactic 
position and role in its sentence. At the moment, the Quranic project has 
adopted the point of view of going for a single analysis for each word, but at 
the same time consistency is encouraged by adopting a primary reference for 
analysis in difficult cases. However, please do feel free to also discuss 
things on the message board as well, as this is often a great way for us all to 
also learn more from each other about Quranic Arabic grammar. I myself have 
personally learnt a great deal by listening in to some of these on-lin!
 e discussions, a good example being the different types of Gender in Quranic 
Arabic (see http://corpus.quran.com/documentation/gender.jsp).

You may also be interested to review the bibliography page here:

http://corpus.quran.com/bibliography.jsp

This page contains a fairly comprehensive set of references and web links for 
Quranic Arabic and also traditional grammar. Also, the corpus website itself 
contains a set of annotation guidelines that we have built up over time 
covering some of the more interesting cases of Quranic Grammar that you may 
come across during annotation:

http://corpus.quran.com/documentation/grammar.jsp

So in summary, I would suggest that for the more difficult cases you might 
consider following the analysis found in the primary reference 
(http://www.archive.org/download/imkam12) is this is more or less what we have 
all done so far through mutual agreement, and hence allows for good consistency 
on the website.

If you have any further questions please don't hesitate to ask. I would be more 
than happy to help.

wa alaykum assalam,

-- Kais Dukes

Language Research Group
School of Computing
University of Leeds

http://corpus.quran.com - The Quranic Arabic Corpus


On Mon, Jan 18, 2010 at 9:39 AM, Fatma Said <fatmaalmus...@yahoo.com> wrote:
> Wa'alykum salaam
>
> Dear Kais,
>
> Thanks for the reply, yes I will do just that perhaps begin with Surah 5 and
> then see how that goes from there. By the way I just read your interview in
> the Muslim news my email was playing up so it just cam in now- that was a
> really good interview. Okay so I'll begin and if I have any problems I'll be
> emailing you :-). What do we do in the instance where there is a difference
> of opinion on the grammatical status of the word? Do we add both or just the
> one we prefer? Like the arguement of some words being تمييز tamyeez or صفة
> siffah? Thanks once again and hope to hear from you soon.
>
> Ma3asalamah
>
> Fatma

Reply via email to