> Hi all

Assalamu Alaykum,

> Thank you for adding my email to the list.
> I am  Phd Student in Morocoo.
> I have some questions about POS.
> How  this POS was built ? (the approach)

The approach used to construct the POS tagset was to follow traditional
Arabic grammar. The idea was that there are many existing syntactic analyses
of the Quran, in book form. To make these machine-readable, we decided to
build a Quranic linguistic database that stores grammatical
and morphological information for each word of the Quran. We wanted to stick
closely to traditional terminology to make the website accessible to a wide
audience (include many non-linguists), who are already familiar with
traditional names for parts-of-speech in Quranic Arabic. The POS tags are a
summary of the existing terms used in traditional Arabic grammar. For
example: (صفة) = Adjective, (ظرف زمان) = Time adverb, (ظرف مكان) = Location
adverb, etc. The online documentation covers this information on the website
(http://corpus.quran.com/documentation/tagset.jsp), and this is discussed
further through various academic publications (
http://corpus.quran.com/publications.jsp).

> Why REL and DEM are under pronouns and not under nouns ?

REL stands for relative pronoun (اسم اشارة) and DEM stands for demonstrative
pronoun (اسم موصول). These are both types of pronoun. Note that in figure 1
on the page you are referring to (
http://corpus.quran.com/documentation/tagset.jsp) the pronouns, nouns,
adjectives and adverbs are all grouped together under "nominals". This is
beacuse in the part-of-speech heirarchy of traditional Arabic grammar, there
are three top-level groups, the nominals, the verbs and the particles. So I
think we are both in agreement here. The only difference is that the website
uses the terminology "nominal" for the top-level group (adjectives, nouns,
pronouns, etc). and reserves the term "noun" for the more specific tag that
is actually used to annotate words.

> Why adjectives are not subcategorized ?

Well, we could subcategorize them. What did you have in mind? Any
suggestions are welcome. So far, people seem pretty happy with the tag ADJ =
adjective = صفة. Note that the tagging is performed automatically through a
computer algorithm and then manually verified online by volunteers. It would
be good to have a reference to a published work that covers the
subcategorization you are suggesting. You may want to use the online message
board (http://corpus.quran.com/messageboard.jsp) to tag any new types of
adjective you come across in the Quran. That would be helpful.

> There are less than 31 Particles, i think some class of particles are
missed ?

One thing to keep in mind about particles is that there is a "long tail"
distribution. So the existing 30 or so existing tags we have for particles
cover more than 99% of actual cases of words (Indeed about 5 or 6 tags cover
80% of cases). There are some things we are still working on (e.g.
additional uses of the praticle lam, or rare uses of the particle waw), but
on the the whole the existing tagset is quite comprehensive with regards to
coverage of particles,  especially when compared to other tagged Arabic
corpora. However, it would be great to know more about the particles you had
in mind. You may want to use the online message board (
http://corpus.quran.com/messageboard.jsp) to tag any new particles you come
across in the Quran. Again, this would be quite helpful.

Thanks and Kind Regards,

-- Kais Dukes

Language Research Group
School of Computing
University of Leeds

http://corpus.quran.com - The Quranic Arabic Corpus

On Wed, Jan 6, 2010 at 9:41 PM, sidrine1 <sidri...@yahoo.fr> wrote:
>
> Hi all
>
> Thank you for adding my email to the list.
>
> I am  Phd Student in Morocoo.
>
> I have some questions about POS.
>
> How  this POS was built ? (the approach)
>
> Why REL and DEM are under pronouns and not under nouns ?
>
> Why adjectives are not subcategorized ?
>
> There are less than 31 Particles, i think some class of particles are
missed ?
>
> Seddik SIDRINE
>
>

Reply via email to