> Hi all Assalamu Alaykum,
> Thank you for adding my email to the list. > I am Phd Student in Morocoo. > I have some questions about POS. > How this POS was built ? (the approach) The approach used to construct the POS tagset was to follow traditional Arabic grammar. The idea was that there are many existing syntactic analyses of the Quran, in book form. To make these machine-readable, we decided to build a Quranic linguistic database that stores grammatical and morphological information for each word of the Quran. We wanted to stick closely to traditional terminology to make the website accessible to a wide audience (include many non-linguists), who are already familiar with traditional names for parts-of-speech in Quranic Arabic. The POS tags are a summary of the existing terms used in traditional Arabic grammar. For example: (صفة) = Adjective, (ظرف زمان) = Time adverb, (ظرف مكان) = Location adverb, etc. The online documentation covers this information on the website (http://corpus.quran.com/documentation/tagset.jsp), and this is discussed further through various academic publications ( http://corpus.quran.com/publications.jsp). > Why REL and DEM are under pronouns and not under nouns ? REL stands for relative pronoun (اسم اشارة) and DEM stands for demonstrative pronoun (اسم موصول). These are both types of pronoun. Note that in figure 1 on the page you are referring to ( http://corpus.quran.com/documentation/tagset.jsp) the pronouns, nouns, adjectives and adverbs are all grouped together under "nominals". This is beacuse in the part-of-speech heirarchy of traditional Arabic grammar, there are three top-level groups, the nominals, the verbs and the particles. So I think we are both in agreement here. The only difference is that the website uses the terminology "nominal" for the top-level group (adjectives, nouns, pronouns, etc). and reserves the term "noun" for the more specific tag that is actually used to annotate words. > Why adjectives are not subcategorized ? Well, we could subcategorize them. What did you have in mind? Any suggestions are welcome. So far, people seem pretty happy with the tag ADJ = adjective = صفة. Note that the tagging is performed automatically through a computer algorithm and then manually verified online by volunteers. It would be good to have a reference to a published work that covers the subcategorization you are suggesting. You may want to use the online message board (http://corpus.quran.com/messageboard.jsp) to tag any new types of adjective you come across in the Quran. That would be helpful. > There are less than 31 Particles, i think some class of particles are missed ? One thing to keep in mind about particles is that there is a "long tail" distribution. So the existing 30 or so existing tags we have for particles cover more than 99% of actual cases of words (Indeed about 5 or 6 tags cover 80% of cases). There are some things we are still working on (e.g. additional uses of the praticle lam, or rare uses of the particle waw), but on the the whole the existing tagset is quite comprehensive with regards to coverage of particles, especially when compared to other tagged Arabic corpora. However, it would be great to know more about the particles you had in mind. You may want to use the online message board ( http://corpus.quran.com/messageboard.jsp) to tag any new particles you come across in the Quran. Again, this would be quite helpful. Thanks and Kind Regards, -- Kais Dukes Language Research Group School of Computing University of Leeds http://corpus.quran.com - The Quranic Arabic Corpus On Wed, Jan 6, 2010 at 9:41 PM, sidrine1 <sidri...@yahoo.fr> wrote: > > Hi all > > Thank you for adding my email to the list. > > I am Phd Student in Morocoo. > > I have some questions about POS. > > How this POS was built ? (the approach) > > Why REL and DEM are under pronouns and not under nouns ? > > Why adjectives are not subcategorized ? > > There are less than 31 Particles, i think some class of particles are missed ? > > Seddik SIDRINE > >