[Mt-list] 2nd Call for Participation - Second VarDial Evaluation Campaign co-located with COLING 2018

Zampieri, Marcos Mon, 26 Feb 2018 11:56:43 -0800

2nd Call for Participation - Second VarDial Evaluation Campaign

Within the scope of the VarDial workshop, co-located with COLING 2018, we are 
organizing an evaluation campaign on similar languages, varieties and dialects 
with multiple shared tasks.


URL: http://alt.qcri.org/vardial2018/index.php?id=campaign

We are organizing five shared tasks this year:

- (ADI) Arabic Dialect Identification: The third edition of the ADI task will 
address the multi-dialectal challenge in spoken Arabic in broadcast news 
domain. Previously, we have shared acoustic features and lexical word sequence 
extracted from large-vocabulary speech recognition (LVCSR). This year, we will 
add phonetic features, which will enable researchers to use both prosodic and 
phonetic features, which are helpful for distinguishing between different 
dialects. 

- (GDI) German Dialect Identification: After a successful first edition of the 
(Swiss) German Dialect Identification task at VarDial 2017, we are organizing a 
second iteration of this task. We will again focus on four Swiss German dialect 
areas (Basel, Bern, Lucerne, Zurich), with the addition of a fifth area subject 
to data availability. We will provide updated and expanded speech transcripts 
for all dialect areas, and also release corresponding acoustic data as well as 
(predicted) part-of-speech tags.

- (MTT) Morphosyntactic Tagging of Tweets: This task focuses on morphosyntactic 
annotation (900+ labels) of non-canonical Twitter varieties of three 
South-Slavic languages -- Slovene, Croatian, and Serbian. Task participants 
will be provided with large manually annotated and raw canonical datasets, as 
well as small manually annotated Twitter datasets. 

- (DFS) Discriminating between Dutch and Flemish in Subtitles: The task focuses 
on determining whether a text is written in the Netherlandic or the Flemish 
variant of the Dutch language. For this task, participants are provided with a 
dataset consisting of almost 100,000 professionally produced subtitles for 
movies, documentaries and television shows. 

- (ILI) Indo-Aryan Language Identification: This task focuses on identifying 5 
closely-related languages of the Indo-Aryan language family – Hindi, Braj 
Bhasha, Awadhi, Bhojpuri, and Magahi. These languages form part of a continuum 
starting from Western Uttar Pradesh (Hindi and Braj Bhasha) to Eastern Uttar 
Pradesh (Awadhi and Bhojpuri) and the neighbouring Eastern state of Bihar 
(Bhojpuri and Magahi). For this task, participants will be provided with a 
dataset of approximately 15,000 sentences in each language, mainly from the 
domain of literature, published over the web as well as in print.

To participate and to receive the training data please fill the registration 
form available at the workshop website. The training sets will be released on 
March 12, 2018.

The VarDial workshop will take place in August 2018 in Santa Fe, USA.

Best, 
Marcos on behalf of the VarDial organizers

-----
Dr. Marcos Zampieri
Research Fellow
Research Group in Computational Linguistics
University of Wolverhampton, UK
http://pers-www.wlv.ac.uk/~u22984/
_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list

[Mt-list] 2nd Call for Participation - Second VarDial Evaluation Campaign co-located with COLING 2018

Reply via email to