CALL FOR PARTICIPATION =============================================================== WMT 2017 Shared Task on Bandit Learning for Machine Translation =============================================================== (collocated with EMNLP 2017)
Check the website for details: http://www.statmt.org/wmt17/bandit-learning-task.html ###### BANDIT LEARNING FOR MACHINE TRANSLATION ###### Bandit Learning for MT is a framework to train and improve MT systems by learning from weak or partial feedback: Instead of a gold-standard human-generated translation, the learner only receives feedback to a single proposed translation (this is why it is called partial), in form of a translation quality judgement (which can be as weak as a binary acceptance/rejection decision). Amazon and University of Heidelberg organize this Shared Task with a goal to encourage researchers to investigate algorithms for learning from weak user feedback instead of from human references or post-edits that require skilled translators. We are interested in finding systems that learn efficiently and effectively from this type of feedback, i.e. they learn fast and achieve high translation quality. Developing such algorithms is interesting for interactive machine learning and for human feedback in NLP in general. In the WMT task setup, the user feedback will be simulated by a service hosted on Amazon Web Services (AWS), where participants can submit translations and receive feedback and use this feedback for training an MT model. Reference translations will not be revealed at any point, also evaluations are done via the service. ###### IMPORTANT DATES ###### All dates are preliminary. Registration via e-mail till March 19, 2017 Access to mock service March, 2017 Access to dev service March 28, 2017 Online learning starts April 25, 2017 Notification of results May 26, 2017 Paper submission deadline June 9, 2017 Acceptance notification June 30, 2017 Camera-ready deadline July 14, 2017 ###### WHY IS IT CALLED BANDIT LEARNING? ###### The name bandit is inherited from a model where in each round a gambler in a casino pulls an arm of a different slot machine, called "one-armed bandit", with the goal of maximizing his reward relative to the maximal possible reward, without apriori knowledge of the optimal slot machine. In MT, pulling an arm corresponds to proposing a translation; rewards correspond to user feedback on translation quality. Bandit learners can be seen as one-state Markov Decision Processes (MDPs), which connects them to reinforcement learning. In MT, proposing a translation corresponds to choosing an action. ###### ONLINE LEARNING PROTOCOL ###### Bandit learning follows an online learning protocol, where on each of a sequence of iterations, the learner receives a source sentence, predicts a translation, and receives a reward in form of a task loss evaluation of the predicted translation. The learner does not know what the correct prediction looks like, nor what would have happened if it had predicted differently. FOR T = 1, ..., T DO * RECEIVE SOURCE SENTENCE * PREDICT TRANSLATION * RECEIVE FEEDBACK TO PREDICTED TRANSLATION * UPDATE SYSTEM Online interaction is done via accessing an AWS-hosted service that provides source sentences to the learner (step 1), and provides feedback (step 3) to the translation predicted by the learner (step 2). The learner updates his parameters using the feedback (step 4) and continues to the next example. ###### DATA ###### For training seed systems, out-of-domain parallel data shall be restricted to German-English Europarl, NewsCommentary, CommonCrawl and Rapid data for the News Translation (constrained) task, monolingual English data from the constrained task is allowed. The in-domain sequence of data for online learning will be e-commerce domain provided by Amazon. These data can only be accessed via the service. No reference translations will be revealed, only feedback to submitted translations is returned from the service. Simulated _reward-type_ real-valued feedback will be based on a combination of several quality models, including automatic measures w.r.t. human references, and will be normalized to the range [0,1] ('very bad' to 'excellent'). Feedback can only be accessed via the service. Only one feedback is allowed per source sentence. ###### SERVICES ###### Three AWS-hosted services will be provided: * MOCK SERVICE to test client API: Will sample from a tiny dataset and simply return BLEU as feedback. * DEVELOPMENT SERVICE to tune algorithms and hyperparameters: Will sample from a larger in-domain dataset. Feedback will be parameterized differently from the learning service to prevent learning from development data. Several runs will be allowed and evaluation results will be communicated to the participants. * ONLINE LEARNING SERVICE: Will sample from a very large in-domain dataset. Participants will have to consume a fixed number of samples during the allocated online learning period to be eligible for final evaluation. The respective data samples will be the same for all participants. ###### EVALUATION ###### The following main evaluation metrics will be used: * ONLINE: cumulative per-sentence reward against the number of iterations, * OFFLINE: standard automatic MT evaluation metric on a held-out in-domain test set, * RELATIVE to the out-of-domain starting point by doing test set evaluations in the beginning and in the end of the online learning sequence. Note that all evaluations are done during online learning and not in a separate offline testing phase. ###### HOW TO PARTICIPATE ###### * Pick your favourite MT system. * Train an out-of-domain model on allowed data. * REGISTER for the task via email ([email protected]) and receive further instructions on how to access the service. * Wrap CLIENT CODE SNIPPETS around your MT system. * SETUP: Test the in-domain-training procedure with the MOCK SERVICE and ensure that your client sends translations and receives feedback. * TUNE: Find a clever strategy and good hyperparameters to learn from weak feedback (e.g. by simulating weak feedback from parallel data, or by using the DEVELOPMENT SERVICE). * TRAIN your in-domain model by starting from your out-of-domain model, submitting translations to the ONLINE LEARNING SERVICE, receiving feedback and updating your model from this feedback. ###### ORGANIZERS ###### Amazon Development Center Berlin and Heidelberg University _______________________________________________ Mt-list site list [email protected] http://lists.eamt.org/mailman/listinfo/mt-list
