CALL FOR PARTICIPATION
===============================================================
WMT 2017 Shared Task on Bandit Learning for Machine Translation
===============================================================
(collocated with EMNLP 2017)

Check the website for details:
http://www.statmt.org/wmt17/bandit-learning-task.html




###### BANDIT LEARNING FOR MACHINE TRANSLATION ######

Bandit Learning for MT is a framework to train and improve MT systems by
learning from weak or partial feedback: Instead of a gold-standard
human-generated translation, the learner only receives feedback to a single
proposed translation (this is why it is called partial), in form of a
translation quality judgement (which can be as weak as a binary
acceptance/rejection decision).

Amazon and University of Heidelberg organize this Shared Task with a goal to
encourage researchers to investigate algorithms for learning from weak user
feedback instead of from human references or post-edits that require skilled
translators. We are interested in finding systems that learn efficiently and
effectively from this type of feedback, i.e. they learn fast and achieve high
translation quality. Developing such algorithms is interesting for interactive
machine learning and for human feedback in NLP in general.

In the WMT task setup, the user feedback will be simulated by a service hosted
on Amazon Web Services (AWS), where participants can submit translations and
receive feedback and use this feedback for training an MT model. Reference
translations will not be revealed at any point, also evaluations are done via
the service.

###### IMPORTANT DATES ######

All dates are preliminary.

Registration via e-mail     till March 19, 2017
Access to mock service      March, 2017
Access to dev service       March 28, 2017
Online learning starts      April 25, 2017
Notification of results     May 26, 2017
Paper submission deadline   June 9, 2017
Acceptance notification     June 30, 2017
Camera-ready deadline       July 14, 2017

###### WHY IS IT CALLED BANDIT LEARNING? ######

The name bandit is inherited from a model where in each round a gambler in a
casino pulls an arm of a different slot machine, called "one-armed bandit",
with the goal of maximizing his reward relative to the maximal possible reward,
without apriori knowledge of the optimal slot machine. In MT, pulling an arm
corresponds to proposing a translation; rewards correspond to user feedback on
translation quality. Bandit learners can be seen as one-state Markov Decision
Processes (MDPs), which connects them to reinforcement learning. In MT,
proposing a translation corresponds to choosing an action.

###### ONLINE LEARNING PROTOCOL ######

Bandit learning follows an online learning protocol, where on each of a
sequence of iterations, the learner receives a source sentence, predicts a
translation, and receives a reward in form of a task loss evaluation of the
predicted translation. The learner does not know what the correct prediction
looks like, nor what would have happened if it had predicted differently.

FOR T = 1, ..., T DO
    * RECEIVE SOURCE SENTENCE
    * PREDICT TRANSLATION
    * RECEIVE FEEDBACK TO PREDICTED TRANSLATION
    * UPDATE SYSTEM

Online interaction is done via accessing an AWS-hosted service that provides
source sentences to the learner (step 1), and provides feedback (step 3) to the
translation predicted by the learner (step 2). The learner updates his
parameters using the feedback (step 4) and continues to the next example.

###### DATA ######

For training seed systems, out-of-domain parallel data shall be restricted to
German-English Europarl, NewsCommentary, CommonCrawl and Rapid data for the
News Translation (constrained) task, monolingual English data from the
constrained task is allowed.

The in-domain sequence of data for online learning will be e-commerce domain
provided by Amazon. These data can only be accessed via the service. No
reference translations will be revealed, only feedback to submitted
translations is returned from the service.

Simulated _reward-type_ real-valued feedback will be based on a combination of
several quality models, including automatic measures w.r.t. human references,
and will be normalized to the range [0,1] ('very bad' to 'excellent'). Feedback
can only be accessed via the service. Only one feedback is allowed per source
sentence.

###### SERVICES ######

Three AWS-hosted services will be provided:
 * MOCK SERVICE to test client API: Will sample from a tiny dataset
and simply return BLEU as feedback.
 * DEVELOPMENT SERVICE to tune algorithms and hyperparameters: Will
sample from a larger in-domain dataset. Feedback will be parameterized
differently from the learning service to prevent learning from
development data. Several runs will be allowed and evaluation results
will be communicated to the participants.
 * ONLINE LEARNING SERVICE: Will sample from a very large in-domain
dataset. Participants will have to consume a fixed number of samples
during the allocated online learning period to be eligible for final
evaluation.

The respective data samples will be the same for all participants.

###### EVALUATION ######

The following main evaluation metrics will be used:

 * ONLINE: cumulative per-sentence reward against the number of iterations,
 * OFFLINE: standard automatic MT evaluation metric on a held-out
in-domain test set,
 * RELATIVE to the out-of-domain starting point by doing test set
evaluations in the beginning and in the end of the online learning
sequence.

Note that all evaluations are done during online learning and not in a separate
offline testing phase.

###### HOW TO PARTICIPATE ######

 * Pick your favourite MT system.
 * Train an out-of-domain model on allowed data.
 * REGISTER for the task via email ([email protected])
and receive further instructions on how to access the service.
 * Wrap CLIENT CODE SNIPPETS around your MT system.
 * SETUP: Test the in-domain-training procedure with the MOCK SERVICE
and ensure that your client sends translations and receives feedback.
 * TUNE: Find a clever strategy and good hyperparameters to learn from
weak feedback (e.g. by simulating weak feedback from parallel data, or
by using the DEVELOPMENT SERVICE).
 * TRAIN your in-domain model by starting from your out-of-domain
model, submitting translations to the ONLINE LEARNING SERVICE,
receiving feedback and updating your  model from this feedback.

###### ORGANIZERS ######

Amazon Development Center Berlin and Heidelberg University
_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list

Reply via email to