I wanted to share this news, so we can get some feedbacks. All comments
are welcome.

---------- Forwarded message ----------
From: Mark Dredze <mdre...@cs.jhu.edu>
To: c...@cs.jhu.edu
Date: Thu, 4 Feb 2010 05:13:39 -0500
Subject: Proposal for the NAACL Mechanical Turk Workshop

Congratulations!  Your proposal for the NAACL Workshop on Creating
Speech and Language Data With Amazon’s Mechanical Turk was selected to
receive the $100 credit. In order to receive the credit, could you
please confirm that:

1. You intend to submit a short paper (4 pages) describing what
results from your proposal
2. You or one of your co-authors will attend the workshop at NAACL in
LA (Note that price of attending the workshop is higher than the $100
3. You will submit your data and your HIT templates along with your
paper, and that its OK to publish them on the workshop web site. If
you cannot share your data due to licensing, please let us know the
restrictions. For commonly available corpora (LDC), we will allow the
posting of the annotations with instructions for how to merge them
with the corpus.

If you agree with all of that, then please send us your Amazon.com
account name and we'll forward it to Amazon Mechanical Turk so that
they can apply the credit. You are of course welcome to add your own
funds as well.

Finally, We've set up a wiki on GitHub so that people can write trade
tips and advice about using Mechanical Turk:

We've put up two hints.  One shows how to record information about
what country your Turkers live in. The other shows some javascript for
highlight words by clicking on them, and records which words are
clicked. Please add to the wiki!

Best Regards,
Chris Callison-Burch and Mark Dredze

p.s. Please let your co-authors know that your proposal was awarded
since we're only sending these notifications to the lead author.

Project Proposal Draft

Quran corpus annotation with Amazon’s Mechanical Turk

Wajdi Zaghouani                      Kais Dukes
Linguistic Data Consortium      School of Computing
University of Pennsylvania       University of Leeds
waj...@ldc.upenn.edu            s...@leeds.ac.uk

1- Quran project presentation
The Quranic Arabic Corpus is an open source project hosted by the
Language Research Group at the University fo Leeds. The aim of this
project is to provide a richly annotated linguistic resource for
researchers wanting to study the language of the Quran.
The Quranic Arabic Corpus provides an annotated linguistic resource
which shows the Arabic grammar, syntax and morphology for each word in
the Holy Quran. The corpus is divided in two levels of analysis:
morphological annotation and a syntactic treebank.

2- Current annotation and needs

Currently, the annotation is provided by volunteer annotators who are
mostly Arabic linguists. Corrections for the online corpus can be easily
made online by clicking on an Arabic word and than posting the desired
suggestion which will be reviewed before being included in the corpus.
Moreover, a message board was created to provide to discussion space for
various issues and suggestion regarding the project

3- Proposed experiment using Mechanical Turk

Mechanical Turk's potentials opens new possibilities for annotating
speech and text.

We will be very interested in having an experiment to evaluate the
effectiveness of using Mechanical Turk to perform corrections and
annotations of the Quran corpus, Especially when it comes to comparing
the existing message board correction with a Mechanical Turk’s solution.
Which would produce better quality ? What will be the annotation volume
for the new approach ? Could the 2 approaches complements each others ?.

Paying for suggested corrections to part-of-speech tagging might
encourage individuals with knowledge of the Arabic language to
participate who might not otherwise. It may also allow for better
quality of work and higher consistency over free volunteer annotation.

The existing website already provides all the required infrastructure to
begin this experiment (an online part-of-speech tagging tool). The
proposed experiment would contrast and compare the two approaches in
terms of annotator speed, inter-annotator agreement, and tagging accuracy.

Reply via email to