STAND Workshop on Standardizing Tasks, meAsures and NLP Datasets
https://stand4nlp.github.io/

Full-day workshop in Paris, France, January 29th 2024 (+ partial hybrid)

Abstract submission deadline: January 24th 2024, but earlier submissions
are welcome

Scientific context:

The current lack of standardized practices and definitions in NLP systems
hinders the progress of the field. Indeed, there is not always consensus on
which evaluation methods are meaningful and fruitful, or which of their
implementations are to be used with which parameters (eg. SacreBLEU, Post
2018).
In some cases, there is no general agreement on the very definition of a
task.
This situation calls for work on *standardizing* NLP practices.

The International Organization for Standardization (ISO) has just created *a
dedicated working group on NLP* (as a joint effort of the AI and Language
committees), and *2 standards* are already under way. Topics under
consideration by the ISO standardization committees include NLP
terminology, evaluation metrics, interoperability, annotation guidelines,
good practices in NLP development/evaluation/corpora, documentation.

These topics are already heavily discussed in academia, and a number of
informal guidelines have already been proposed. We believe that the
creation of NLP standards can significantly benefit from the input of both NLP
academics and industry NLP practitioners.
Reciprocally, NLP researchers would benefit from getting involved in the
standardization effort, thus ensuring that academia's views are listened
to, in particular in the context of the *AI Act* (the European regulation
on AI that has been finalized in December), whose enforcement will strongly
rely on those standards.

The STAND workshop is a research initiative whose goal is:

   - to foster discussion on existing standards, their creation and use
   - to assess the current needs of the community for standardization
   - to share experience on the impact on the research activities when
   lacking good practices
   - to collect existing good practices (and propose new ones)


We invite contributions from NLP practitioners from both the industry and
academia, as well as standardization experts.

We invite two types of submission:
* short abstract: 1 page
* long abstract: 3 pages

Accepted submissions will be presented as posters. Authors accepted in the
long-abstract track will be invited to submit a full paper (5-10 pages)
after the workshop.
Topics for submissions include, but are not limited to:

   - Comparability and reproducibility of evaluation setup
   - Annotation guidelines
   - Evaluation metrics
   - Good practices for building, annotating and maintaining corpora
   - Good practices for system evaluation
   - Interoperability
   - Ethical guidelines
   - Guidelines for documenting corpora and models


Submission instructions:

   - Submissions are expected in PDF form by email at [email protected]
   - All submissions should be formatted using the ACL 2023 style files
   https://2023.aclweb.org/calls/style_and_formatting/.


============

PROGRAM AT A GLANCE:

[09:00-10:00] Welcome, introduction to standardization, ongoing activities
in NLP standardization, and the AI Act context
[10:15-11:50] Academic keynote (*Joakim Nivre*) and invited talks (*Matt
Post*, other speaker TBC)
[11:50-13:30] Poster session (with boosters) & lunch
[13:30-14:40] Industry keynote (speaker TBC) and invited talk (*Dirk Hovy*)
[15:00-16:30] Moderator-led breakout discussions. Potential topics that
will be discussed include:
    - [sharing / drafting] Standardizing good practices for evaluation
    - [sharing / drafting] Standardizing good practices for corpus
management (collection, annotation, versioning)
    - [sharing / drafting] Standardizing evaluation metrics (definitions,
implementation, sharing scripts)
    - [sharing / drafting] Standardizing annotation schemes (formats and
guidelines)
    - [debate] Explainability and ethics in NLP: what needs for standards?
    - [debate] Comparing standardization needs with limitations of the
state-of-the-art: how to bridge the gap?
    - [debate] Towards standardizing translations of technical terminology
in NLP: how to organize i18n?
[16:30-17:30] Reports from breakouts, definition of community-level actions
& wrap-up. Example outcomes that are envisioned include:
    - Collection and drafting of existing good practices
    - Preparation of a joint submission for a position paper
    - Creation of common repositories for evaluation scripts, corpus
documentation


Participants to the workshop will be offered the opportunity to attend a
standardization committee's meeting, which has been scheduled for the day
after the workshop (January 30th). The outputs of that meeting will be used
in direct support of the AI Act.

Remote access will be offered for part of the workshop only. In-person
participation is recommended if possible.
Posters will be in-person only.


IMPORTANT DATES:
Abstract submission: Anytime by January 24
Notification of acceptance: Within a few days of submission
Workshop: January 29
Standardization committee meeting: January 30


ORGANISING COMMITTEE:
Lauriane Aufrant, Timothée Bernard, Maximin Coavoux, Yoann Dupont, Arnaud
Ferré, Taras Holoyad, Rania Wazir

MORE INFORMATION
For the latest information see the workshop page at
https://stand4nlp.github.io/; for any questions contact [email protected].
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to