[Apologies for cross-posting]

The second workshop on resources and representations for under-resourced 
language and domains (RESOURCEFUL-2023) explores the role of the kind and the 
quality of resources that are available to us and challenges and directions for 
constructing new resources in light of the latest trends in natural language 
processing.

Data-driven machine-learning techniques in natural language processing have 
achieved remarkable performance (e.g., BERT, GPT, ChatGPT) but in order to do 
so large quantities of quality data (which is mostly text) is required. 
Interpretability studies of large language models in both text-only and 
multi-modal setups have revealed that even in cases where large text datasets 
are available, the models still do not cover all the contexts of human social 
activity and are prone to capturing unwanted bias where data is focused towards 
only some contexts. A question has also been raised whether textual data is 
enough to capture semantics of natural language processing and other modalities 
such as visual representations or a situated context of a robot might be 
required. Annotator-based resources have been constructed over years based on 
theoretical work in linguistics, psychology and related fields and a large 
amount of work has been done both theoretically and practically.

The purpose of the workshop is to initiate a discussion between the two 
communities involved in building resources (data vs annotation-based) and 
exploring their synergies for the new challenges in natural language 
processing. We encourage contributions in the areas of resource creation, 
representation learning and interpretability in data-driven and expert-driven 
machine learning setups and both uni-modal and multi-modal scenarios.
 
In particular we would like to open a forum by bringing together students, 
researchers, and experts to address and discuss the following questions:

 - What is relevant linguistic knowledge the models should capture and how can 
this knowledge be sampled and extracted in practice?
 - What kind of linguistic knowledge do we want and can capture in different 
contexts and tasks?
 - To what degree are resources that have been traditionally aimed at 
rule-based natural language processing approaches relevant today both for 
machine learning techniques and hybrid approaches?
 - How can they be adapted for data-driven approaches?
 - To what degree data-driven approaches can be used to facilitate 
expert-driven annotation?
 - What are current challenges for expert-based annotation?
 - How can crowd-sourcing and citizen science be used in building resources?
 - How can we evaluate and reduce unwanted biases?

Intended participants are researchers, PhD students and practitioners from 
diverse backgrounds (linguistics, psychology, computational linguistics, 
speech, computer science, machine learning, computer vision etc). We foresee an 
interactive workshop with plenty of time for discussion, complemented with 
invited talks and presentations of on-going or completed research.

This workshop is a continuation of the first workshop on resources and 
representations for under-resourced languages and domains held together with 
the SLTC 2020, https://gu-clasp.github.io/resourceful-2020/.


** Important dates:
 - Submission deadline for archival papers: 28th March 2023
 - Submission deadline for non-archival papers: 4 April 2023
 - Notification of acceptance: 25th April 2023
 - Camera-ready version: 9th May 2023
 - Workshop date: 22nd May 2023

All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").


** Submission
We invite submissions of long papers (8 pages), short papers (4 pages), and 
extended abstracts describing work in progress (2 pages). Submissions can 
report negative results and be opinion pieces. Both papers and extended 
abstracts can include any number of pages for references. All submissions must 
follow the NoDaLida template, available in both LaTeX and MS Word, the 
templates are available at the official conference website, 
https://www.nodalida2023.fo/authorkit-nodalida23 Submissions must be anonymous 
and submitted in the PDF format through OpenReview.

We also invite submissions of non-archival papers related to our theme already 
presented or published at other venues. These can be submitted in their 
original formatting. They will be reviewed by the workshop organisers and the 
accepted ones will be posted on the workshop website. 

Authors may be asked to contribute peer-reviews of papers.

** Workshop organisers
Dana Dannélls, Språkbanken Text, University of Gothenburg
Simon Dobnik, CLASP, University of Gothenburg
Adam Ek, CLASP, University of Gothenburg
Stella Frank, University of Copenhagen
Nikolai Ilinykh, CLASP, University of Gothenburg
Beáta Megyesi, Uppsala University
Felix Morger, Språkbanken Text, University of Gothenburg
Joakim Nivre, RISE and Uppsala University
Magnus Sahlgren, AI Sweden
Sara Stymne, Uppsala University
Jörg Tiedemann, University of Helsinki
Lilja Øvrelid, University of Oslo


---
Adam Ek
PhD Student
Centre for Linguistic Theories and Studies in Probability (CLASP)
Department of Philosophy, Linguistics and Theory of Science
University of Gothenburg
[email protected]





_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to