[Apologies for cross-posting]
The second workshop on resources and representations for under-resourced
language and domains (RESOURCEFUL-2023,
https://resourceful-workshop.github.io/resourceful-2023/index.html)
explores the role of the kind and the quality of resources that are
available to us and challenges and directions for constructing new
resources in light of the latest trends in natural language processing.
The workshop is co-located with NoDaLiDa2023
(https://www.nodalida2023.fo/) at Tórshavn, Faroe Islands on May
22nd-24th, 2023.
Data-driven machine-learning techniques in natural language processing
have achieved remarkable performance (e.g., BERT, GPT, ChatGPT) but in
order to do so large quantities of quality data (which is mostly text)
is required. Interpretability studies of large language models in both
text-only and multi-modal setups have revealed that even in cases where
large text datasets are available, the models still do not cover all the
contexts of human social activity and are prone to capturing unwanted
bias where data is focused towards only some contexts. A question has
also been raised whether textual data is enough to capture semantics of
natural language processing and other modalities such as visual
representations or a situated context of a robot might be required.
Annotator-based resources have been constructed over years based on
theoretical work in linguistics, psychology and related fields and a
large amount of work has been done both theoretically and practically.
The purpose of the workshop is to initiate a discussion between the two
communities involved in building resources (data vs annotation-based)
and exploring their synergies for the new challenges in natural language
processing. We encourage contributions in the areas of resource
creation, representation learning and interpretability in data-driven
and expert-driven machine learning setups and both uni-modal and
multi-modal scenarios.
In particular we would like to open a forum by bringing together
students, researchers, and experts to address and discuss the following
questions:
- What is relevant linguistic knowledge the models should capture and
how can this knowledge be sampled and extracted in practice?
- What kind of linguistic knowledge do we want and can capture in
different contexts and tasks?
- To what degree are resources that have been traditionally aimed at
rule-based natural language processing approaches relevant today both
for machine learning techniques and hybrid approaches?
- How can they be adapted for data-driven approaches?
- To what degree data-driven approaches can be used to facilitate
expert-driven annotation?
- What are current challenges for expert-based annotation?
- How can crowd-sourcing and citizen science be used in building
resources?
- How can we evaluate and reduce unwanted biases?
Intended participants are researchers, PhD students and practitioners
from diverse backgrounds (linguistics, psychology, computational
linguistics, speech, computer science, machine learning, computer vision
etc). We foresee an interactive workshop with plenty of time for
discussion, complemented with invited talks and presentations of
on-going or completed research.
This workshop is a continuation of the first workshop on resources and
representations for under-resourced languages and domains held together
with the SLTC 2020, https://gu-clasp.github.io/resourceful-2020/.
** Important dates:
- Submission deadline for archival papers: 28th March 2023
- Submission deadline for non-archival papers: 4 April 2023
- Notification of acceptance: 25th April 2023
- Camera-ready version: 9th May 2023
- Workshop date: 22nd May 2023
All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").
** Submission
We invite submissions of long papers (8 pages), short papers (4 pages),
and extended abstracts describing work in progress (2 pages).
Submissions can report negative results and be opinion pieces. Both
papers and extended abstracts can include any number of pages for
references. All submissions must follow the NoDaLida template, available
in both LaTeX and MS Word, the templates are available at the official
conference website, https://www.nodalida2023.fo/authorkit-nodalida23
Submissions must be anonymous and submitted in the PDF format through
OpenReview.
We also invite submissions of non-archival papers related to our theme
already presented or published at other venues. These can be submitted
in their original formatting. They will be reviewed by the workshop
organisers and the accepted ones will be posted on the workshop website.
Authors may be asked to contribute peer-reviews of papers.
** Workshop organisers
Dana Dannélls, Språkbanken Text, University of Gothenburg
Simon Dobnik, CLASP, University of Gothenburg
Adam Ek, CLASP, University of Gothenburg
Stella Frank, University of Copenhagen
Nikolai Ilinykh, CLASP, University of Gothenburg
Beáta Megyesi, Uppsala University
Felix Morger, Språkbanken Text, University of Gothenburg
Joakim Nivre, RISE and Uppsala University
Magnus Sahlgren, AI Sweden
Sara Stymne, Uppsala University
Jörg Tiedemann, University of Helsinki
Lilja Øvrelid, University of Oslo
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]