Dear all MT researchers/users, I'm Toshiaki Nakazawa from The University of Tokyo. This is the call for participation for the shared tasks of the 5th Workshop on Asian Translation (WAT2018). http://lotus.kuee.kyoto-u.ac.jp/WAT/
WAT2018 will be collocated with the PACLIC32 (Dec 1-3 in Hong Kong). http://www.cbs.polyu.edu.hk/2018paclic/index.php If you are working on machine translation, please join us. IMPORTANT DATES --------------- August 31 Translation Task Submission Deadline October 26 System Description Paper Submission Deadline November 2 Review Feedback of System Description Paper November 9 Camera-ready Deadline December 1, 2 or 3 WAT2018 * All deadlines are calculated at 11:59PM UTC-7 Best regards, --------------------------------------------------------------------------- WAT2018 (The 5th Workshop on Asian Translation) in collaboration with PACLIC32 http://lotus.kuee.kyoto-u.ac.jp/WAT/ December 1, 2 or 3, 2018, Hong Kong Following the success of the previous WAT workshops, WAT2018 will bring together machine translation researchers and users to try, evaluate, share and discuss brand-new ideas about machine translation. WAT2018 does NOT accept research papers. Instead you can submit them to PACLIC32. http://www.cbs.polyu.edu.hk/2018paclic/call-for-papers.php What's NEW in WAT2018: * baseline translations are updated to NMT (from PBSMT) * additional test data for patent tasks * Myanmar-English translation tasks * multilingual translation subtask for 10 Indian languages ************************* IMPORTANT NOTICE ************************* Participants of the previous workshop are also required to sign up to WAT2018 ******************************************************************** TASK ---- The task is to improve the text translation quality for scientific papers and patent documents. Participants choose any of the subtasks in which they would like to participate and translate the test data using their machine translation systems. The WAT organizers will evaluate the results submitted using automatic evaluation and human evaluation. We will also provide a baseline machine translation. Tasks: Scientific Paper Tasks: [Asian Scientific Paper Excerpt Corpus (ASPEC)] English/Chinese <--> Japanese Patent Tasks: [Japan Patent Office Patent Corpus 2.0 (JPC2)] English/Chinese/Korean <--> Japanese Chinese -> Japanese Expression Pattern Task Newswire Tasks: English <--> Japanese [JIJI Corpus] Indian Language Tasks: Hindi <--> English [IIT Bombay (IITB) Corpus] 10 Indian Languages [NEW!!] Mixed domain tasks: UCSY and ALT Corpora Myanmar (Burmese) <--> English [NEW!!] Recipe Tasks: [Cookpad Comparable Corpus] Japanese <--> English Dataset: * Scientific paper Tasks: WAT uses ASPEC for the dataset including training, development, development test and test data. Participants of the scientific paper tasks must get a copy of ASPEC by themselves. ASPEC consists of approximately 3 million Japanese-English parallel sentences from paper abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese paper excerpts (ASPEC-JC) * Patent Tasks: WAT uses JPO Patent Corpus 2.0 (JPC2), which is constructed by Japan Patent Office (JPO). This corpus consists of 1 million parallel sentences from patent description with four categories (Chemistry, Electricity, Machine and Physics) for each language pair (English-Japanese, Chinese-Japanese and Korean-Japanese). Participants are required to get it on WAT2018 site of JPC2. - English/Chinese/Korean <--> Japanese: These tasks evaluate performance of a translation model similarly as the other translation tasks. Differing from the previous tasks at WAT2015, WAT2016 and WAT2017, new test sets of these tasks consists of (a) patent documents published between 2011 and 2013, which were used in the past years' WAT, and (b) ones published between 2016 and 2017 for each language pair. We will also evaluate performance of the section (a) so as to compare systems submitted in the past years' WAT. - Chinese -> Japanese Expression Pattern Task: This task evaluates performance of a translation model for each predefined category of expression patterns, which corresponds to title of invention (TIT), abstract (ABS), scope of claim (CLM) or description (DES). Test set of this task consists of sentences each of which is annotated with a corresponding category of expression patterns. * Newswire Tasks (English <--> Japanese): WAT uses JIJI Corpus, which is constructed by Jiji Press Ltd. in collaboration with the National Institute of Information and Communications Technology (NICT). This corpus consists of a Japanese-English news corpus of 200K parallel sentences, from Jiji Press news with various categories. Participants of patent tasks are required to get it on WAT2017 site of JIJI Corpus. * Indian Language Tasks: TBA (Keep watching our WEB site) * Myanmar <--> English Tasks: WAT uses UCSY Corpus and ALT Corpus. The UCSY corpus and a portion of the ALT corpus are use as training data, which are around 220,000 lines of sentences and phrases. The development and test data are from the ALT corpus. * Recipe Tasks: WAT uses Recipe Corpus, which is constructed by Cookpad Inc. This corpus consists of 16,282 Japanese-English parallel sentences from recipes. Participants of recipe tasks are required to get it on WAT2018 site of Recipe Corpus. EVALUATION ---------- Automatic evaluation: We are providing an automatic evaluation server. It is for free for everyone, but you need to create an account for evaluation. Just showing the list of evaluation results does not require an account. Sign-up: http://lotus.kuee.kyoto-u.ac.jp/WAT/registration/index.html Eval. result: http://lotus.kuee.kyoto-u.ac.jp/WAT/evaluation/index.html Human evaluation: Both crowdsourcing evaluation and JPO adequacy evaluation will be carried out for selected subtasks and selected submitted systems (the details will be announced later). Participants can submit one translation result for each subtask. ORGANIZERS ---------- Pushpak Bhattacharyya, Indian Institute of Technology Bombay (IIT), India Raj Dabre, National Institute of Information and Communications Technology (NICT), Japan Chenchen Ding, National Institute of Information and Communications Technology (NICT), Japan Isao Goto, Japan Broadcasting Corporation (NHK), Japan Jun Harashima, Cookpad Inc., Japan Shohei Higashiyama, National Institute of Information and Communications Technology (NICT), Japan Hideo Kazawa, Google, Japan Anoop Kunchukuttan, Microsoft Research India, India Sadao Kurohashi, Kyoto University, Japan Hideya Mino, Japan Broadcasting Corporation (NHK), Japan Toshiaki Nakazawa, The University of Tokyo, Japan Graham Neubig, Carnegie Mellon University (CMU), Japan Yusuke Oda, Google, Japan Win Pa Pa, University of Computer Studies, Yangon (UCSY), Myanmar Katsuhito Sudoh, Nara Institute of Science and Technology (NAIST), Japan CONTACT ------- [email protected] _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
