(apologies for cross-posting)

==== Call for Participation ====

The MADAR Shared Task on Arabic Fine-Grained Dialect Identification -
Colocated with The 4th Arabic Natural Language Processing Workshop (WANLP
2019 <http://wanlp2019.arabic-nlp.net/>)     and ACL 2019 in Florence,
Italy (August 1, 2019).

Website:  https://sites.google.com/view/madar-shared-task/
Registration Link:
https://docs.google.com/forms/d/e/1FAIpQLSe3zUMW_gWY6oHU9QHqkxN_QRAgx3Z8kY8MCaYrrfMBlZPkzQ/viewform

Introduction
Arabic dialect identification is the task of automatically labeling a
segment of speech or text with the dialect it comes from. Most of previous
work and shared tasks on dialect identification focused on regional level
dialect labeling as in efforts by Zaidan and Callison-Burch, Elfardy and
Diab, and the VarDial ADI evaluation campaign. This shared task will be the
first to target a large set of dialect labels at the city and country
levels. The data for the shared task is created or collected under the
Multi-Arabic Dialect Applications and Resources (MADAR) project.

Shared Task
There are two subtasks in this shared task.

Subtask 1: MADAR Travel Domain Dialect Identification. The data of this
subtask is the same reported on in the following papers.

Bouamor, H., Habash, N., Salameh, M., Zaghouani, W., Rambow, O., et al.
(2018). The MADAR Arabic Dialect Corpus and Lexicon. In Proceedings of the
11th International Conference on Language Resources and Evaluation. (PDF:
http://www.lrec-conf.org/proceedings/lrec2018/pdf/351.pdf)

Salameh, M., Bouamor, H. & Habash, N. (2018). Fine-Grained Arabic Dialect
Identification. In Proceedings of the 27th International Conference on
Computational Linguistics. (PDF: http://aclweb.org/anthology/C18-1113)

Subtask 2: MADAR Twitter User Dialect Identification. This is a new data
set created for this shared task.

Metrics: The evaluation metrics will include
precision/recall/f-score/accuracy in addition to a new hierarchical
evaluation metric designed for Arabic dialects. Macro Averaged F-score will
be the official metric.

Participants need to register using the registration link below. All
participating teams will be provided with a common training data set and a
common development set. No external manually labelled data sets are
allowed. A blind test data set will be used to evaluate the output of the
participating teams. An evaluation script will be also provided to all the
teams. All teams are required to report on the development and test set in
their write-ups.

The shared task will be hosted through CODALAB (Links TBD).

Registration Link:
https://docs.google.com/forms/d/e/1FAIpQLSe3zUMW_gWY6oHU9QHqkxN_QRAgx3Z8kY8MCaYrrfMBlZPkzQ/viewform

IMPORTANT DATES
December 10, 2018: First announcement of the shared task

January 7, 2019: Announcement of shared task website and the beginning of
registration

January 28, 2019: Release of initial training data and scoring script

March 18, 2019: Final training data release

April 29, 2019: Registration deadline

May 6, 2019: Test set made available

May 13, 2019: Systems' outputs collected

May 27, 2019: Shared task system paper submissions due

June 17, 2019: Notification of acceptance

June 24, 2019: Camera-ready version of shared task system papers due

August 1, 2019: ACL 2019 Workshop in Florence

TASK ORGANISERS
Houda Bouamor (Fortia Financial Solutions, France)
Sabit Hasan (Carnegie Mellon University Qatar, Qatar)
Nizar Habash (New York University Abu Dhabi, UAE)

CONTACT
For any questions related to this task, please post to this google group,
or contact the organizers directly using the following email address:
[email protected]

----

*Wajdi Zaghouani, Ph.D.*

*Assistant Professor*
College of Humanities and Social Sciences

P.O. Box 34110 | Education City | Doha, Qatar
tel: +974 4454 5601 | mob: +974 33454992

[email protected]| Office A141, LAS Building
_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list

Reply via email to