Hello Yassine,

I would suggest you download the file *quranic-corpus-text-0.1.zip* from
http://corpus.quran.com/download. You will need to fill in a simple form
with your e-mail address. The file is seperated using the pipe character ...
| ... and contains the Arabic text as well as part-of-speech tag, so I would
look for the PN tags and extract these from the file you download.

This file has a small error in it, in that the last verse of each chapter is
missing, but I think we can just account for that for now in your analysis.
Sorry this is the only file at present. Version 0.2 of the corpus (released
later in January) will contain updated and corrected files.

I guess from your point of view, its interesting to know the F-measure
accuracy of your tagger. From the point of view of development of the Quran
Arabic Corpus, its interesting to know if your Named Entity Tagger for
Arabic can add any new information to the corpus.

Looking forward to hearing from you.

Kind Regards,

-- Kais
On Wed, Jan 6, 2010 at 7:30 PM, Yassine Benajiba
<benajibayass...@gmail.com>wrote:

> Hello Kais,
>
> I prefer the comparison based on the data that you have and I can do that
> pretty quickly if you can send me a file which has two columns (similar to
> the ones I pointed you to) where the first column has the word and the
> second has your annotation. If you are able to do that I can tell you both
> the F-measure of my NER system on the Quran text and the differences in the
> annotation.
>
> Cheers,
> --Yassine.
>
> On Wed, Jan 6, 2010 at 1:07 PM, Kais Dukes <dukes.k...@googlemail.com>wrote:
>
>> Hello Yassine,
>>
>> I took a look at this data and it looks good, however I am wondering if
>> this has produced just a subset of the proper noun tagging that we already
>> have in annotated corpus. I guess the next thing to do is try to estimate
>> the quality of this analysis. A quick easy way to do this would be to
>> process the data to produce a sorted frequency list. So for example, it
>> would be good to see a list of all the named entities, and how many times
>> they occur (sorted by frequency), e.g. for example:
>>
>> Allah = 2000
>> Ibrahim = 100
>> Musa = 50
>> etc.
>>
>> Of course I would think that the named entities would be in Arabic (not
>> English as per the above example) based on the data you have. I thought
>> about writing a quick script to scrape through the data files you posted and
>> do this myself, however ...  If you are interested in doing this at your
>> end, that would be great! If we had this sorted frequency list of named
>> entities, we could then quickly eyeball the data and it would give an
>> impression as to how good the tagging was.
>>
>> The other approach we have, is to use the PN tag (proper nouns) already in
>> the corpus. So what I can do is the compare this list of named entities with
>> your own. So far, based on a quick look at the data, it looks like your
>> named entity software is not giving us anything additional above the proper
>> noun tagging that we already have - however, once we get the sorted
>> frequency list we can make a better judgement.
>>
>> I am hopeful that your NER software will produce more data than the PN
>> tags that we have already, so that we can then use this is a basis for the
>> named entity project we are working on here at the moment.
>>
>> Keep up the good work - this looks great so far!
>>
>> Kind Regards,
>>
>> -- Kais Dukes
>>
>> Language Research Group
>> School of Computing
>> Univeristy of Leeds
>>
>> http://quran.corpus.com - The Quranic Arabic Corpus
>>   On Wed, Jan 6, 2010 at 2:42 PM, Yassine Benajiba <
>> benajibayass...@gmail.com> wrote:
>>
>>> Hi Kais,
>>>
>>> I have finally got to run my models on the quran, you can download the
>>> result files from this link:
>>> http://www1.ccls.columbia.edu/~ybenajiba/QuranNER/
>>>
>>> You have two files there, the first one has only the NEs without classes
>>> and the other one has the classes. If there is anything else I can do please
>>> let me know (for instance if you want to see the features that I have used
>>> let me know).
>>>
>>> Best,
>>> --Yassine.
>>>
>>>
>>> On Tue, Jan 5, 2010 at 1:32 PM, Kais Dukes <dukes.k...@googlemail.com>wrote:
>>>
>>>> Hi Yassine,
>>>>
>>>> If you want to run your named entity tagging algorithms against the
>>>> Arabic text of the Quran, that would be very interesting. I think for this
>>>> initial experient, perhaps just capture all NEs without classification, 
>>>> e.g.
>>>> keep things simple?
>>>>
>>>> I would be interested to see what results this produces.
>>>>
>>>> Looking forward to hearing from you.
>>>>
>>>> Kind Regards,
>>>>
>>>> -- Kais
>>>>
>>>> On Tue, Jan 5, 2010 at 4:05 PM, Yassine Benajiba <
>>>> benajibayass...@gmail.com> wrote:
>>>>
>>>>> Hi Kais,
>>>>>
>>>>> I am sorry about that, I didn't realize there were many mailing lists,
>>>>> I just click on "Reply All". Anyways, I have got the text of Quran and I
>>>>> will start the processing asap. If you are not interested in the NEs 
>>>>> classes
>>>>> let me know, because my system performs better when it has to only capture
>>>>> the NEs without classifying them. Let me know.
>>>>>
>>>>> Cheers,
>>>>> --Yassine.
>>>>>
>>>>>
>>>>> On Mon, Jan 4, 2010 at 12:55 PM, Kais Dukes <dukes.k...@googlemail.com
>>>>> > wrote:
>>>>>
>>>>>> Hello Yassine,
>>>>>>
>>>>>> Hopefully named entity tagging is something that we can discuss on the
>>>>>> new mailing list!
>>>>>>
>>>>>> It would be great to see how your software runs against the Quran. If
>>>>>> you are willing to perform this experiment for us, then that would be 
>>>>>> very
>>>>>> helpful as it might speed up the construction of the named entity 
>>>>>> tagging we
>>>>>> are hoping to add to the website at some stage.
>>>>>>
>>>>>> The Quranic Arabic Corpus uses the Arabic text of the Quran from the
>>>>>> Tanzil project, which you can download here (with various options):
>>>>>> http://tanzil.info/wiki/Download_Quran_Text
>>>>>>
>>>>>> If you are looking to perform a simple experiment, just to say how
>>>>>> things work, I would recommend the simple text as a first try (not the
>>>>>> Uthmani script) as this may have orthography which is closer to modern
>>>>>> standard Arabic. If you have any other questions, please don't hesitate 
>>>>>> to
>>>>>> ask, I would be happy to help.
>>>>>> Looking forward to hearing from you.
>>>>>>
>>>>>> Kind Regards,
>>>>>>
>>>>>> -- Kais
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to