[Corpora-List] CfP: VarDial 2024 and Shared Tasks

Yves Scherrer via Corpora Wed, 21 Feb 2024 01:28:32 -0800

VarDial 2024, the eleventh workshop on NLP for similar languages, varieties and 
dialects, will be held in conjunction with NAACL in Mexico City, on June 20/21, 
2024.


We welcome papers dealing with one or more of the following topics:
- Corpora, resources, and tools for similar languages, varieties and dialects;
- Adaptation of tools (taggers, parsers) for similar languages, varieties and 
dialects;
- Evaluation of language resources and tools when applied to language varieties;
- Reusability of language resources in NLP applications (e.g., for machine 
translation, POS tagging, syntactic parsing, etc.);
- Corpus-driven studies in dialectology and language variation;
- Computational approaches to mutual intelligibility between dialects and 
similar languages;
- Automatic identification of lexical variation;
- Automatic classification of language varieties;
- Text similarity and adaptation between language varieties;
- Linguistic issues in the adaptation of language resources and tools (e.g., 
semantic discrepancies, lexical gaps, false friends);
- Machine translation between closely related languages, language varieties and 
dialects.
In addition to the topics listed above, we also welcome papers dealing with 
diachronic language variation (e.g. phylogenetic methods, historical dialects).

Paper submission deadline: March 10, 2024 (AoE)
Details: https://sites.google.com/view/vardial-2024/call-for-papers

The VarDial workshop has a history of hosting well-attended shared tasks on 
various dialects and languages. In 2024, we organize the two following tasks:

1. The DIALECT-COPA shared task on dialectal causal commonsense reasoning

This shared task invites the community to propose, develop, and test approaches 
for adapting models for causal commonsense language understanding to three 
dialects of South-Slavic languages: the Slovenian Cerkno dialect, the Croatian 
Chakavian dialect, and the Serbian, Macedonian and Bulgarian Torlak dialect. 
Training and development data based on the COPA (Choice of plausible 
alternatives, Roemmele et al. 2011) dataset are available for four related 
standard languages (Slovenian, Croatian, Serbian, Macedonian) and two out of 
the three testing dialects (Cerkno, Torlak), the Chakavian dialect serving as a 
surprise dialect.

2. DSL-ML - Multi-label classification of similar languages

The DSL-ML task is a multi-label extension of the classic "Discriminating 
similar languages" task that has been popular with VarDial since the beginnings 
of the workshop. The motivation behind this new task formulation is that some 
texts do not present any linguistic markers to unambiguously determine their 
origin. It therefore makes sense to predict several possible labels for such 
texts. The 2024 DSL-ML task is based on multi-label conversions of existing 
datasets from five different macro-languages: English, Spanish, Portuguese, 
French and BCMS (Bosnian, Croatian, Montenegrin, Serbian).

Test results submission deadline: March 11, 2024 (AoE)
System description paper submission deadline: March 24, 2024 (AoE)
Registration: https://forms.gle/UcLYcPgDFJoiAVip7 
Details: https://sites.google.com/view/vardial-2024/shared-tasks

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] CfP: VarDial 2024 and Shared Tasks

Reply via email to