Assalamu Alaykum wa Rahmatallah wa Barakatu Great to hear from you Fatma. I hope you don't mind, I am sending the reply to your question below to the comp-quran discussion list. It may be useful to note the approach we have followed so far, for other future volunteer annotators on the project. Thanks again for joining the computational Quranic project, and also for willing to review some of the annotation in the corpus. It is through the many volunteers constantly reviewing the text that this project is steadily progressing towards one of its primary goals - a highly accurate annotated tagged Arabic Quran. The project follows the methodology of open on-line collaborative annotation, which as you point out requires some way to resolve differences of opinion.
You raise a very good question below - what do to in situations where there could be more than one tag for each word, or possibly different analyses? In general, given enough time I hope that at a later stage we can consider tagging the corpus with different points of view for these rare more difficult cases. However, for the vast majority of words most annotators are usually in full agreement So far, inter-annotator agreement has been quite high on the part-of-speech and morphological tagging projects. It would appear that so far the issues are more to do with correcting mistakes that may still remain from the initial automatic tagging of the Quran through algorithmic means. It is usually rare that two annotators will disagree on the correct grammatical analysis for a word in the Quran. However, for the more interesting cases, I would consider discussing these on the open message board on the corpus website: http://corpus.quran.com/messageboard.jsp The message board is also a good place to get a feel for what annotation involves, and is also a chance to see what the more common issues are with when considering part-of-speech tagging for the Quran. When using the message board, you will get to engage with many different volunteer annotators, who are also all looking at the tags in detail, and may be able to shed new light or a different point of view on the tagging for a specific word. It is also interesting to note that there now seems to be more or less a general consensus on the message board, and among annotators, to adopt the following text as a primary reference: http://www.archive.org/download/imkam12 This is mostly because this reference is quite detailed, and has proven through experience to be fairly consistent in its analysis. I would suggest that you might want to download the required volume from the set of 12 PDF books from the above link. These form a very comprehensive and detailed syntactic and morphological analysis for the Quran, written in Arabic, that covers each word for each verse in a fair amount of detail with regards to its syntactic position and role in its sentence. At the moment, the Quranic project has adopted the point of view of going for a single analysis for each word, but at the same time consistency is encouraged by adopting a primary reference for analysis in difficult cases. However, please do feel free to also discuss things on the message board as well, as this is often a great way for us all to also learn more from each other about Quranic Arabic grammar. I myself have personally learnt a great deal by listening in to some of these on-lin! e discussions, a good example being the different types of Gender in Quranic Arabic (see http://corpus.quran.com/documentation/gender.jsp). You may also be interested to review the bibliography page here: http://corpus.quran.com/bibliography.jsp This page contains a fairly comprehensive set of references and web links for Quranic Arabic and also traditional grammar. Also, the corpus website itself contains a set of annotation guidelines that we have built up over time covering some of the more interesting cases of Quranic Grammar that you may come across during annotation: http://corpus.quran.com/documentation/grammar.jsp So in summary, I would suggest that for the more difficult cases you might consider following the analysis found in the primary reference (http://www.archive.org/download/imkam12) is this is more or less what we have all done so far through mutual agreement, and hence allows for good consistency on the website. If you have any further questions please don't hesitate to ask. I would be more than happy to help. wa alaykum assalam, -- Kais Dukes Language Research Group School of Computing University of Leeds http://corpus.quran.com - The Quranic Arabic Corpus On Mon, Jan 18, 2010 at 9:39 AM, Fatma Said <fatmaalmus...@yahoo.com> wrote: > Wa'alykum salaam > > Dear Kais, > > Thanks for the reply, yes I will do just that perhaps begin with Surah 5 and > then see how that goes from there. By the way I just read your interview in > the Muslim news my email was playing up so it just cam in now- that was a > really good interview. Okay so I'll begin and if I have any problems I'll be > emailing you :-). What do we do in the instance where there is a difference > of opinion on the grammatical status of the word? Do we add both or just the > one we prefer? Like the arguement of some words being تمييز tamyeez or صفة > siffah? Thanks once again and hope to hear from you soon. > > Ma3asalamah > > Fatma