Dear colleagues,

[apologies for cross-posting]

We would like to remind you that this year SIGTYP is hosting a Shared Task
on Word Embedding Evaluation for Ancient and Historical Language:
https://github.com/sigtyp/ST2024/

Test data has been released, and CodaLab competitions are up and running,
so we encourage you to register if you still haven't! There is still a week
before the deadline. :)

*Summary*
In recent years, sets of downstream tasks called benchmarks have become a
very popular, if not default, method to evaluate general-purpose word and
sentence embeddings. Starting with decaNLP (McCann et al., 2018) and
SentEval (Conneau & Kiela, 2018), multitask benchmarks for NLU keep
appearing and improving every year. However, even the largest multilingual
benchmarks, such as XGLUE, XTREME, XTREME-R or XTREME-UP (Hu et al., 2020;
Liang et al., 2020; Ruder et al., 2021, 2023), only include modern
languages. When it comes to ancient and historical languages, scholars
mostly adapt/translate intrinsic evaluation datasets from modern languages
or create their own diagnostic tests. We argue that there is a need for a
universal evaluation benchmark for embeddings learned from ancient and
historical language data and view this shared task as a proving ground for
it.

The shared task involves solving the following problems for 12+ ancient and
historical languages that belong to 4 language families and use 6 different
scripts. Participants will be invited to describe their system in a paper
for the SIGTYP workshop proceedings. The task organizers will write an
overview paper that describes the task and summarizes the different
approaches taken, and analyzes their results.

*Subtasks*
For subtask A, participants are not allowed to use any additional data;
however, they can reduce and balance provided training datasets if they see
fit. For subtask B, participants are allowed to use any additional data in
any language, including pre-trained embeddings and LLMs.

A. Constrained

   1.     POS-tagging
   2.     Full morphological annotation
   3.     Lemmatisation

B. Unconstrained

   1.     POS-tagging
   2.     Detailed morphological annotation
   3.     Lemmatisation
   4.     Filling the gaps
      - Word-level
      - Character-level


*Important links*

   - *Registration form*
   
<https://docs.google.com/forms/d/e/1FAIpQLSdINgMfzzZGIZ-uBVQhvyndB6yeaaj-wT7v45A6UB4F2h6QBQ/viewform?usp=sf_link>
   - Detailed description, incl. submission format: https://github.com/
   sigtyp/ST2024 <https://github.com/sigtyp/ST2024>
   - Constrained subtask on CodaLab:
   https://codalab.lisn.upsaclay.fr/competitions/16822
   - Unconstrained subtask on CodaLab:
   https://codalab.lisn.upsaclay.fr/competitions/16818


*Important dates*

    *05 Nov 2023*: Release of training and validation data
    *02 Jan 2024*: Release of test data
- *    09 Jan 2024:* Submission of results for Phase 1 of the Constrained
Subtask
- *    12 Jan 2024:* Submission of results for Phase 2 of the Constrained
Subtask and for the Unconstrained Subtask    *13 Jan 2024*: Notification of
results
    *20 Jan 2024*: Submission of shared task papers
    *27 Jan 2024*: Notification of acceptance to authors
    *03 Feb 2024*: Camera-ready
    *15 Mar 2024*: Video recordings due
    *21/22 Mar 2024*: SIGTYP workshop


Kind regards,

Oksana and the organisers' team

-- 
[image: https://nuig.insight-centre.org/]
<https://www.insight-centre.org/>

Oksana Dereza  | PhD student on the Cardamom
<http://cardamom.insight-centre.org/> project | Unit for Linguistic Data |
Insight Centre for Data Analytics | Data Science Institute | University of
Galway

Oksana Dereza  | Iarrthóir PhD ar thionscadal Cardamom
<http://cardamom.insight-centre.org/> | An tAonad um Shonraí Teangeolaíocha
| Insight, Ionad na hAnailísíochta Sonraí | Institiúid Eolaíochta Sonraí |
Ollscoil na Gaillimhe
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to