[Corpora-List] 2nd Call for Papers: Workshop on RESOURCEs and representations For Under-resourced Languages and domains (RESOURCEFUL-2023)

Adam via Corpora Wed, 01 Mar 2023 01:54:00 -0800

[Apologies for cross-posting]

The second workshop on resources and representations for under-resourcedlanguage and domains (RESOURCEFUL-2023,https://resourceful-workshop.github.io/resourceful-2023/index.html)explores the role of the kind and the quality of resources that areavailable to us and challenges and directions for constructing newresources in light of the latest trends in natural language processing.The workshop is co-located with NoDaLiDa2023(https://www.nodalida2023.fo/) at Tórshavn, Faroe Islands on May22nd-24th, 2023.

Data-driven machine-learning techniques in natural language processinghave achieved remarkable performance (e.g., BERT, GPT, ChatGPT) but inorder to do so large quantities of quality data (which is mostly text)is required. Interpretability studies of large language models in bothtext-only and multi-modal setups have revealed that even in cases wherelarge text datasets are available, the models still do not cover all thecontexts of human social activity and are prone to capturing unwantedbias where data is focused towards only some contexts. A question hasalso been raised whether textual data is enough to capture semantics ofnatural language processing and other modalities such as visualrepresentations or a situated context of a robot might be required.Annotator-based resources have been constructed over years based ontheoretical work in linguistics, psychology and related fields and alarge amount of work has been done both theoretically and practically.

The purpose of the workshop is to initiate a discussion between the twocommunities involved in building resources (data vs annotation-based)and exploring their synergies for the new challenges in natural languageprocessing. We encourage contributions in the areas of resourcecreation, representation learning and interpretability in data-drivenand expert-driven machine learning setups and both uni-modal andmulti-modal scenarios.

In particular we would like to open a forum by bringing togetherstudents, researchers, and experts to address and discuss the followingquestions: - What is relevant linguistic knowledge the models should capture andhow can this knowledge be sampled and extracted in practice? - What kind of linguistic knowledge do we want and can capture indifferent contexts and tasks? - To what degree are resources that have been traditionally aimed atrule-based natural language processing approaches relevant today bothfor machine learning techniques and hybrid approaches?

 - How can they be adapted for data-driven approaches?

- To what degree data-driven approaches can be used to facilitateexpert-driven annotation?

 - What are current challenges for expert-based annotation?

- How can crowd-sourcing and citizen science be used in buildingresources?

 - How can we evaluate and reduce unwanted biases?

Intended participants are researchers, PhD students and practitionersfrom diverse backgrounds (linguistics, psychology, computationallinguistics, speech, computer science, machine learning, computer visionetc). We foresee an interactive workshop with plenty of time fordiscussion, complemented with invited talks and presentations ofon-going or completed research.

This workshop is a continuation of the first workshop on resources andrepresentations for under-resourced languages and domains held togetherwith the SLTC 2020, https://gu-clasp.github.io/resourceful-2020/.


** Important dates:

 - Submission deadline for archival papers: 28th March 2023
 - Submission deadline for non-archival papers: 4 April 2023
 - Notification of acceptance: 25th April 2023
 - Camera-ready version: 9th May 2023
 - Workshop date: 22nd May 2023

All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").

** Submission

We invite submissions of long papers (8 pages), short papers (4 pages),and extended abstracts describing work in progress (2 pages).Submissions can report negative results and be opinion pieces. Bothpapers and extended abstracts can include any number of pages forreferences. All submissions must follow the NoDaLida template, availablein both LaTeX and MS Word, the templates are available at the officialconference website, https://www.nodalida2023.fo/authorkit-nodalida23Submissions must be anonymous and submitted in the PDF format throughOpenReview.

We also invite submissions of non-archival papers related to our themealready presented or published at other venues. These can be submittedin their original formatting. They will be reviewed by the workshoporganisers and the accepted ones will be posted on the workshop website.


Authors may be asked to contribute peer-reviews of papers.

** Workshop organisers

Dana Dannélls, Språkbanken Text, University of Gothenburg
Simon Dobnik, CLASP, University of Gothenburg
Adam Ek, CLASP, University of Gothenburg
Stella Frank, University of Copenhagen
Nikolai Ilinykh, CLASP, University of Gothenburg
Beáta Megyesi, Uppsala University
Felix Morger, Språkbanken Text, University of Gothenburg
Joakim Nivre, RISE and Uppsala University
Magnus Sahlgren, AI Sweden
Sara Stymne, Uppsala University
Jörg Tiedemann, University of Helsinki
Lilja Øvrelid, University of Oslo
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] 2nd Call for Papers: Workshop on RESOURCEs and representations For Under-resourced Languages and domains (RESOURCEFUL-2023)

Reply via email to