CALL FOR PARTICIPATION
 
==============================================================
DiscoMT 2017 Shared Task on Cross-lingual Pronoun Prediction
==============================================================

Website: https://www.idiap.ch/workshop/DiscoMT/shared-task 
<https://www.idiap.ch/workshop/DiscoMT/shared-task>
At the 3rd Workshop on Discourse in Machine Translation (collocated with EMNLP 
2017)

We are pleased to announce an exciting cross-lingual pronoun prediction task 
for people interested in (discourse-aware) machine translation, anaphora 
resolution and machine learning in general.

In the cross-lingual pronoun prediction task, participants are asked to predict 
a target-language pronoun given a source-language pronoun in the context of a 
sentence. For example, in the English-to-French sub-task, to predict the 
correct translation of "it" or "they" into French (ce, elle, elles, il, ils, 
ça, cela, on, OTHER). You may use any type of information that can be extracted 
from the documents. We provide training and development data and a simple 
baseline system using an N-gram language model.

Participants are invited to submit systems for the English-French and 
English-German, German-English and Spanish-English language pairs.

More details can be found below, and on our website: 
https://www.idiap.ch/workshop/DiscoMT/shared-task 
<https://www.idiap.ch/workshop/DiscoMT/shared-task>

Important Dates:

March 2017    Release of training data
2 May 2017    Release of test data 
9 May 2017    System submission deadline 
15 May 2017    Release of results
9 June 2017    System paper submission deadline
30 June 2017    Notification of acceptance
14 July 2017    Camera-ready papers due


Discussion group: 
https://groups.google.com/forum/#!forum/discomt2017-cross-lingual-pronoun-prediction-shared-task
 
<https://groups.google.com/forum/#!forum/discomt2017-cross-lingual-pronoun-prediction-shared-task>

-------------------------------------------------------------------------
Acknowledgements:
The organisation of this task has received support from the following project: 
Discourse-Oriented Statistical Machine Translation funded by the Swedish 
Research Council (2012-916)
-------------------------------------------------------------------------



=========================
Detailed Task Description
=========================

OVERVIEW

Pronoun translation poses a problem for current MT systems as pronoun systems 
do not map well across languages, e.g., due to differences in gender, number, 
case, formality, or humanness, and to differences in where pronouns may be 
used. Translation divergences typically lead to mistakes in MT output, as when 
translating the English "it" into French ("il", "elle", or "cela"?) or into 
German ("er", "sie", or "es"?). One way to model pronoun translation is to 
treat it as a cross-lingual pronoun prediction task.

We propose such a task, which asks participants to predict a target-language 
pronoun given a source-language pronoun in the context of a sentence. We 
further provide a lemmatised target-language human-authored translation of the 
source sentence, and automatic word alignments between the source sentence 
words and the target-language lemmata. In the translation, the words aligned to 
a subset of the source-language third-person pronouns are substituted by 
placeholders. The aim of the task is to predict, for each placeholder, the word 
that should replace it from a small, closed set of classes, using any type of 
information that can be extracted from the documents.

The cross-lingual pronoun prediction task will be similar to the task of the 
same name at WMT16:

http://www.statmt.org/wmt16/pronoun-task.html 
<http://www.statmt.org/wmt16/pronoun-task.html>

Participants are invited to submit systems for the English-French, 
English-German, German-English and Spanish-English language pairs.


TASK DESCRIPTION

In the cross-lingual pronoun prediction task, you are given a source-language 
document with a lemmatised and POS-tagged human-authored translation and a set 
of word alignments between the two languages. In the translation, the 
lemmatised tokens aligned to the source-language third-person pronouns are 
substituted by placeholders. Your task is to predict, for each placeholder, the 
fully inflected word token that should replace the placeholder from a small, 
closed set of classes. I.e., to provide the fully inflected translation of the 
source pronoun in the context sketched by the lemmatised/tagged target side. 
You may use any type of information that you can extract from the documents.

Lemmatised and POS-tagged target-language data is provided in place of fully 
inflected text. The provision of lemmatised data is intended both to provide a 
challenging task, and to simulate a scenario that is more closely aligned with 
working with machine translation system output. POS tags provide additional 
information which may be useful in the disambiguation of lemmas (e.g. noun vs. 
verb, etc.) and in the detection of patterns of pronoun use.

The pronoun prediction task will be run for the following sub-tasks:
English-to-French
English-to-German
German-to-English
Spanish-to-English ****New****

Details of the source-language pronouns and the prediction classes that exist 
for each of the above sub-tasks are provided in the following section (below). 
The different combinations of source-language pronoun and target-language 
prediction classes represent some of the different problems that MT systems 
face when translating pronouns for a given language pair and translation 
direction.

The task will be evaluated automatically by matching the predictions against 
the words found in the reference translation by computing the overall accuracy 
and precision, recall and F-score for each class. The primary score for the 
evaluation is the macro-averaged F-score over all classes. Compared to 
accuracy, the macro-averaged F-score favours systems that consistently perform 
well on all classes and penalises systems that maximise the performance on 
frequent classes while sacrificing infrequent ones.

The data supplied for the classification task consists of parallel 
source-target text with word alignments. In the target-language text, a subset 
of the words aligned to source-language occurrences of a specified set of 
pronouns have been replaced by placeholders of the form REPLACE_xx, where xx is 
the index of the source-language word the placeholder is aligned to. Your task 
is to predict one of the classes listed in the relevant source-target section 
below, for each occurrence of a placeholder.


SOURCE-LANGUAGE PRONOUN SETS AND TARGET-LANGUAGE PREDICTION CLASS DETAILS

The following sections describe the set of source-language pronouns and 
target-language classes to be predicted, for each of the four sub-tasks. 

This year, the sub-task of translation from Spanish-into-English has been 
included. This pair involves the additional difficulty of having to generate 
the Spanish null subjects into English. The training data follows an identical 
format to that of the other language pairs. The difference is that the 
REPLACE_xx placeholder points to the position of a third person Spanish verb 
with no overt subject. 

You should *always* predict either a word token or "OTHER". See prediction 
class lists below for a list of word tokens to predict for each sub-task.

English-to-French

This sub-task will concentrate on the translation of subject position "it" and 
"they" from English into French. The following prediction classes exist for 
this sub-task:

* ce: The French pronoun ce (sometimes with elided vowel as c') as in the 
expression c'est "it is"
* elle: Feminine singular subject pronoun
* elles: Feminine plural subject pronoun
* il: Masculine singular subject pronoun
* ils: Masculine plural subject pronoun
* cela: Demonstrative pronouns. Includes "cela", "ça", the misspelling "ca", 
and the rare elided form "ç'"
* on: Indefinite pronoun
* OTHER: Some other word, or nothing at all, should be inserted

Spanish-to-English

This sub-task will concentrate on the translation of third person Spanish verbs 
without an overt subject. The following prediction classes exist for this 
sub-task:

he    Masculine singular subject pronoun
she    Feminine singular subject pronoun
it    Non-gendered singular subject pronoun
they    Non-gendered plural subject pronoun
there    Existential "there"
OTHER    Some other word, or nothing at all, should be inserted

English-to-German

This sub-task will concentrate on the translation of subject position "it" and 
"they" from English into German. The following prediction classes exist for 
this sub-task:

* er: Masculine singular subject pronoun
* sie: Feminine singular subject pronoun
* es: Neuter singular subject pronoun
* man: Indefinite pronoun
* OTHER: Some other word, or nothing at all, should be inserted

German-to-English

This sub-task will concentrate on the translation of subject position "er", 
"sie" and "es" from German into English. The following prediction classes exist 
for this sub-task:

* he: Masculine singular subject pronoun
* she: Feminine singular subject pronoun
* it: Non-gendered singular subject pronoun
* they: Non-gendered plural subject pronoun
* you: Second person pronoun (with both generic or deictic uses)
* this: Demonstrative pronouns (singular). Includes both "this" and "that"
* these: Demonstrative pronouns (plural). Includes both "these" and "those"
* there: Existential "there"
* OTHER: Some other word, or nothing at all, should be inserted


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to