This is the call for participation for the MT shared tasks and research papers to the 9th Workshop on Asian Translation (WAT2022), workshop of COLING 2022. Those who are working on machine translation, please join us.
IMPORTANT DATES --------------- July 11 - Shared Task Submission Deadline July 11 - Research Paper Submission Deadline August 1 - System Description Paper for Shared Tasks Submission Deadline August 22 - Notification of Acceptance for Research Papers August 29 - Review Feedback of System Description Papers September 5 - Camera-ready Deadline (both Research and System Description Papers) September 19 - Workshop Proceedings Deadline October 12-17 - Workshop Date * All deadlines are calculated at 11:59PM UTC-12 Best regards, --------------------------------------------------------------------------- WAT2022 (The 9th Workshop on Asian Translation) in conjunction with COLING2022 http://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2022 OCTOBER 12-17, 2022 / GYEONGJU, REPUBLIC OF KOREA Following the success of the previous WAT workshops (WAT2014 -- WAT2021), WAT2022 will bring together machine translation researchers and users to try, evaluate, share and discuss brand-new ideas about machine translation. For the 9th WAT, we will include the following new tasks: * English <--> Japanese Parallel Corpus Filtering Task * Khmer --> English/French Speech Translation Task * Chinese <--> Japanese Restricted Translation Task * Japanese --> English Video Guided Translation Task * English --> Bengali Multi-Modal Translation Task * Sinhala, Nepali, Assamese, Sindhi, Urdu <--> English (5 new languages in the MultiIndicMT task) * English <--> Japanese/Korean/Chinese NICT-SAP structured document translation Task * English <--> Vietnamese (new pair added to the NICT-SAP multilingual multi-domain Task) together with the following continuing tasks: * Document-level Translation Tasks English/Chinese <--> Japanese scientific paper English <--> Japanese newswire English <--> Japanese business scene dialogue * English/Chinese/Korean <--> Japanese patent task * English --> Hindi/Malayalam Multi-Modal Translation Task * English <--> Japanese Restricted Translation Task In addition to the shared tasks, the workshop will also feature scientific papers on topics related to machine translation, especially for Asian languages. Topics of interest include, but are not limited to: - analysis of the automatic/human evaluation results in the past WAT workshops - word-/phrase-/syntax-/semantics-/rule-based, neural and hybrid machine translation - Asian language processing - incorporating linguistic information into machine translation - decoding algorithms - system combination - error analysis - manual and automatic machine translation evaluation - machine translation applications - quality estimation - domain adaptation - machine translation for low resource languages - language resources ************************* IMPORTANT NOTICE ************************* Participants of the previous workshop are also required to sign up to WAT2022 ******************************************************************** TRANSLATION TASKS ----------------- The task is to improve the text translation quality for scientific papers and patent documents. Participants choose any of the subtasks in which they would like to participate and translate the test data using their machine translation systems. The WAT organizers will evaluate the results submitted using automatic evaluation and human evaluation. We will also provide a baseline machine translation. Tasks: * Document-level translation tasks: - ASPEC+ParaNatCom: English --> Japanese Scientific Paper - BSD Corpus: English <--> Japanese Business Scene Dialogue - JIJI Corpus: English <--> Japanese Newswire - NICT-SAP: Hindi/Thai/Malay/Indonesian/Vietnamese <--> English - NICT-SAP: Japanese/Korean/Chinese <--> English (structured document translation) * Multimodal translation tasks: - Visual Genome: English --> Hindi/Malayalam/Bengali - Ambiguous MS COCO: English <--> Japanese * Video Guided Translation task: - VISA: Japanese --> English * Indic tasks: - MultiIndicMT: Assamese/Bengali/Gujarati/Hindi/Kannada/Malayalam/Marathi/Nepali/Odia/Punjabi/Tamil/Telugu/Urdu/Sindhi/Sinhala <--> English * Patent task: - JPC3: English/Chinese/Korean <--> Japanese * Restricted Translation tasks: - English/Chinese <--> Japanese * Parallel Corpus Filtering task: - English <--> Japanese Dataset: * Scientific paper WAT uses ASPEC for the dataset including training, development, development test and test data. Participants of the scientific papers subtask must get a copy of ASPEC by themselves. ASPEC consists of approximately 3 million Japanese-English parallel sentences from paper abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese paper excerpts (ASPEC-JC) * Patent WAT uses JPO Patent Corpus, which is constructed by Japan Patent Office (JPO). This corpus consists of 1 million English-Japanese parallel sentences, 1 million Chinese-Japanese parallel sentences, and 1 million Korean-Japanese parallel sentences from patent description with four categories. Participants of patent tasks are required to get it on WAT site of JPO Patent Corpus. Differing from the previous tasks at WAT2018-2021, new test-N4 sets will be additionally used, and previous test-N sets will be replaced by new test-2022 sets.. * IT and Wikinews - Hindi/Thai/Malay/Indonesian/Vietnamese <--> English In collaboration with SAP and NICT, WAT will continue the translation task for English to/from Hindi, Thai, Malay and Indonesian. Additionally, this year English to/from Vietnamese evaluation data is also available. The evaluation data belongs to the IT domain (Software Documentation) and Wikinews domain (Asian Language Treebank). Participants will be expected to train systems and submit translations for all language pairs (to and from English) and both domains using any existing monolingual or parallel data. Given the growing focus on a universal translation model for multiple languages and domains, WAT encourages a single multilingual and multi-domain model for all language pairs and both domains (IT as well as Wikinews). Additional details will be given on the WAT 2022 website. - Japanese/Chinese/Korean <--> English In addition to the task above, WAT will offer a new task for English to/from Japanese, Chinese and Korean structured document translation. Structured pages/documents contain sentences annotated with rich meta information. For example: "This is a <b>sentence</b>." is an example of a sentence in a structured document. Its translation in Spanish should be: "Esta es una <b>frase</b>." where the <b> tag appropriately encloses the translation of the word "sentence". Structured document translation is challenging as the translation system will have to deal with the alignment of the content enclosed in tags, especially when training data without structure information is unavailable. Additional details will be given on the WAT 2022 website. * Newswire WAT uses JIJI Corpus, which is constructed by Jiji Press Ltd. in collaboration with the National Institute of Information and Communications Technology (NICT). This corpus consists of a Japanese-English news corpus of 200K parallel sentences, from Jiji Press news with various categories. Participants of the newswire subtask are required to get it on WAT2022 site of JIJI Corpus. * Indic - Indian language <--> English multilingual translation task. This task is a successor to the 2018, 2020, and 2021 tasks with major improvements. There has been an increase in the available datasets for Indian languages in the past few years along with major advances in multilingual learning. The task will involve training a multilingual model for 15 Indian languages to English (and vice-versa) translation. 5 new languages, Urdu, Nepali, Sindhi, Sinhala and Assameae, have been added this year. The goal is to encourage exploration of methods which utilize multilingualism and language relatedness to improve translation quality for low-resource languages while having a single, compact translation model. The evaluation set is 15-way parallel enabling the potential evaluation of non-English centric language pairs, some of which we will evaluate. * Multimodal Given the growing interest in multimodal NLP and the warm response from the participants for the “WAT 2019 and 2020 Multimodal Translation Tasks”, WAT will evaluate the following multimodal tasks: - English --> Hindi Multimodal (Visual Genome) WAT will continue organizing the multimodal English --> Hindi translation task where the input will be text and an Image and the output will be a caption (text). The training set contains around 30,000 segments. Additional details will be given on the task website. - English --> Malayalam Multimodal (Visual Genome) WAT will continue organizing the multimodal English --> Malayalam translation task where the input will be text and an Image and the output will be a caption (text). The training set contains around 30,000 segments. Additional details will be given on the task website. - English --> Bengali Multimodal (Visual Genome) WAT will continue organizeing a new the multimodal English --> Bengali translation task where the input will be text and an Image and the output will be a caption (text). The training set contains around 30,000 segments. Additional details will be given on the task website. - Japanese <--> English Multimodal (Ambiguous MS COCO) WAT will organize an additional multimodal Japanese <--> English translation task where the evaluation set, Ambiguous MS COCO, will focus on translation of ambiguous words and sentences. Along with the Flickr30kEnt-JP dataset, the MS COCO English data may also be used. Additional details will be given on the task website. * Parallel Corpus Filtering We also plan to add parallel corpus filtering tasks, which ask participants to clean noisy parallel corpus, then train the models with a fixed setting and evaluate their accuracy. Competitors are required to improve translation accuracy by only removing training data that may hurt the model. This year, we will provide a noisy parallel corpus on Japanese-English, which is not focused on other shared-tasks yet. EVALUATION ---------- Automatic evaluation: We are providing an automatic evaluation server. It is free for everyone, but you need to create an account for evaluation. Just showing the list of evaluation results does not require an account. Sign-up: http://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2022/ Eval. result: http://lotus.kuee.kyoto-u.ac.jp/WAT/evaluation/ Human evaluation: Both crowdsourcing evaluation and JPO adequacy evaluation will be carried out for selected subtasks and selected submitted systems (the details will be announced later). ORGANIZERS ---------- - Toshiaki Nakazawa, The University of Tokyo, Japan [GENERAL, ASPEC+ParaNatCom, BSD] - Isao Goto, Japan Broadcasting Corporation (NHK), Japan [GENERAL, JIJI] - Hidaya Mino, Japan Broadcasting Corporation (NHK), Japan [GENERAL, JIJI] - Chenchen Ding, National Institute of Information and Communications Technology (NICT), Japan [GENERAL] - Raj Dabre, National Institute of Information and Communications Technology (NICT), Japan [MultiIndicMT, NICT-SAP] - Anoop Kunchookuttan, Microsoft AI and Research, India [MultiIndicMT] - Shohei Higashiyama, National Institute of Information and Communications Technology (NICT), Japan [JPC] - Hiroshi Manabe, National Institute of Information and Communications Technology (NICT), Japan [GENERAL] - Shantipriya Parida,Silo AI, Finland [Hindi Visual Genome, Malayalam Visual Genome, Bengali Visual Genome] - Ondřej Bojar, Charles University, Prague, Czech Republic [Hindi Visual Genome, Malayalam Visual Genome, Bengali Visual Genome] - Chenhui Chu, Kyoto University, Japan [Ambiguous MS COCO] - Akiko Eriguchi, Microsoft, USA [Restricted Translation] - Kaori Abe,Tohoku University, Japan [Restricted Translation] - Yusuke Oda, LegalForce, Japan [Restricted Translation, Parallel Corpus Filtering] - Makoto Morishita, NTT, Japan [Parallel Corpus Filtering] - Katsuhito Sudoh, Nara Institute of Science and Technology (NAIST), Japan [GENERAL] - Sadao Kurohashi, Kyoto University, Japan [GENERAL] - Pushpak Bhattacharyya, Indian Institute of Technology Patna (IITP), India [GENERAL] CONTACT ------- wat-organi...@googlegroups.com _______________________________________________ Moses-support mailing list Moses-support@mit.edu https://mailman.mit.edu/mailman/listinfo/moses-support