Hola Luis
¿qué tal?
Acabo de ver en Corpora-list que estás a tope con temas de chatbots.
A lo mejor ya te ha llegado la info: estamos organizando una tarea que
puede que os pueda interesar.
A ver si participas ;-)
Saludos
Paolo
-----
*Apologies for cross-posting*
Do you believe machine generated text is becoming an issue? Are you
interested in boosting research to automatically detect machine
generated text? 🤖👩🏻
We cordially invite all researchers and practitioners from all fields
to participate in the AuTexTification task. If interested, register
yourself in the shared task through this link: https://lnkd.in/dzBZsYiD
Once registered and training phase started, the datasets will be sent
to your email along with a password. Look for more information
regarding task description, schedules, or submissions through the
Autextification web page: https://sites.google.com/view/autextification
More information on the shared task
The new era of automatic content generation has surged through
powerful causal language models like GPT, PALM, or Bloom that can be
used to spread untruthful news, human-looking reviews, or opinions.
Thus, it is imperative to develop technology to automatically detect
generated text for content moderation and to attribute generated text
to specific models to protect intellectual property or to distill
responsibilities. In this context, we propose the “Automatic Text
Identification” (AuTexTification) shared task, to boost research and
development of automatic systems to detect automatically generated
text, obtained by state-of-the-art language models, in English and
Spanish.
We propose two subtasks: (i) Human or Generated, where given a
text participants will have to determine whether a text has been
automatically generated or not; and (ii) Model Attribution, where
participants will have to determine what model generated a text. The
generation models used to generate the text are of increasing number
of neural parameters, ranging from 2 to 175 billion, meaning that
participants' systems should be versatile enough to detect a diverse
set of text generation models and writing styles.
In the training phase, participants will be provided with two
partitions for subtask 1, i.e., English and Spanish partitions, with
binary labels 👩🏻 and 🤖. Similarly, a partition per language will be
released for subtask 2. It will include six labels (A, B, C, D, E, and
F), each label representing a text generation model. Later, the
unlabeled test data will be released.
Important Dates
March 22, 2023: Release of training data
April 21, 2023: Release of test data
May 10, 2023: Participant system results submission
May 17, 2023: Results notification
June 3, 2023: Paper submission
June 16, 2023: Paper peer-reviewed
July 4, 2023: Camera-ready paper version
September 26, 2023: Conference
Task organizers
José Ángel González (Symanto) Contact Email: [email protected]
Areg Sarvazyan (Symanto) Contact Email: [email protected]
Marc Franco-Salvador (Symanto)
Francisco Rangel (Symanto)
Berta Chulvi (Universitat Politècnica de València)
Paolo Rosso (Universitat Politècnica de València)
Please reach out to the organizers or join the Slack workspace to
connect with the other participants and organizers:
https://lnkd.in/di_zaMHf
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]