Dear Jack, Many thanks for your insights. In fact, our general approach to combining statistical machine translation with linguistically-informed tools goes (in my opinion) exactly in the direction that you suggest. Considering that some higher-level linguistic phenomena, especially those involving long-range dependencies, are difficult to learn by phrase-based SMT models, we aim at "informing" them about such features thanks to natural language processing tools which are capable to detect them. Such tools can either be "statistical" or "rule-based", though often they are somewhere in-between (rules optimized through learning).
We proposed this approach for discourse connectives (Meyer Th. et al., "Machine Translation of Labeled Discourse Connectives", Proc. of AMTA 2012) and in this internship we would like to explore the case of verb tenses, which we are in fact studying jointly with linguists in Geneva. Thanks again and best regards, Andrei Jack Halpern wrote (on 03/02/2013 00:26): > > Andrei Popescu-Belis wrote... > >Internship at the Idiap Research Institute, Martigny, Switzerland > > This is Jack Halpern, head of the CJK Dictionary Institute in > Japan. > > Regarding the message quoted below, mapping verb tenses between > languages using statistical approaches sounds like an interesting > approach, though I suspect that one cannot expect accurate, > reliable results mostly for the same reasons (which I discussed > at conferences) that SMT itself has its limits. > > As Prof. Martin Kay has pointed out in a keynote address, > MT and computational linguistics have gone too far down > the statistical path, ignoring "linguistic knowledge" > which is embedded in rules. In fact, a Professor Nagao > pointed out at a recent conerence in Kyoto, recent trends > are to use a hybrid approach combining rule-based MT with > statistical methods to achieve better scores. > > Be that as it may, verb tenses and conjugations are clearly > a field where rules should play a dominant role. We have > developed a comprehensive Spanish-to-English tense mapping > table for EBMT that demonstrates this, resulting in a highly > successful system that approaches human quality. > > Recently, we have released a pedagogically oriented smartphone > app for Arabic verb tenses that provides exhaustive coverage of > Arabic verb paradigms. This highly successful Arab-English > bilingual conjugator is of course based on rules, resulting from > several years of analysis and development. (More information is > available at http://www.cjk.org/cjk/arabic/cave/cavepressE.pdf ) > > I do want to encourage MT and CL researchers and developers > not to forget that language is a human activity requiring > human knowledge. Although statistical methods are extremely > powerful, they have a limit and must be supplemented by > rules based on linguistic knowledge. > > > >Learning verb tense translation from parallel > >corpora for statistical machine translation > > > >http://www.idiap.ch/education-and-jobs/ > > > >Contact: Andrei Popescu-Belis > > > > > >Description > > > >Applications are invited for a 6-month internship (preferably at the > >Master level) in the field of statistical machine translation (SMT). > > > >At Idiap, we are studying methods for using text-level information to > >improve SMT, in the context of the Swiss COMTIS project > >(www.idiap.ch/comtis). In particular, we have successfully combined > >classifiers for discourse connectives with state-of-the-art SMT systems, > >showing an improvement on such particles. In collaborations with > >linguists, we currently analyze the features that govern the translation > >of verb tenses, mainly from English to French. The main challenge for MT > >is that there is no one-to-one mapping from English to French tenses, > >and the correct choice depends on the context. > > > >The goal of the internship is to design and implement a method for > >predicting the translation of verb tenses, applied to English/French > >translation (or another European language). First, training data will be > >generated by the word alignment of annotated parallel corpora. Then, a > >classifier will be trained to predict tense translation based on lexical > >and semantic features. Its output will then be used to train and test a > >tense-aware SMT system, which will be evaluated in terms of BLEU > >improvement but also verb-specific scores (METEOR or ACT). > > > >The applicants should have a background in computer science or > >linguistics. Knowledge of computational linguistics and machine learning > >would be an advantage. Previous experience with statistical machine > >translation would be highly appreciated. The applicants should have > >demonstrable programming skills in at least one scripting language such > >as Perl or Python, or master a programming language such as Java or > >C/C++. Good command of English and knowledge of another European > >language (preferably French) are mandatory. > > > >The applications should be submitted before March 15, 2013, with > >priority given to those submitted earlier. The internship can start > >immediately, but no later than July 1st, 2013. The appointment is for 6 > >months, with a gross salary of 2000 CHF per month. > > > >About Idiap > > > >Idiap is an independent, non-profit research institute recognized and > >supported by the Swiss Government, and affiliated with the Ecole > >Polytechnique F馘駻ale de Lausanne (EPFL). It is located in the town of > >Martigny in Valais, a scenic region in the south of Switzerland, > >surrounded by the highest mountains of Europe, and offering exciting > >recreational activities, including hiking, climbing and skiing, as well > >as varied cultural activities. It is within close proximity to Geneva > >and Lausanne. Although Idiap is located in the French part of > >Switzerland, English is the working language. Free French lessons are > >provided. > > > >Idiap offers competitive salaries and conditions at all levels in a > >young, dynamic, and multicultural environment. Idiap is an equal > >opportunity employer and is actively involved in the "Advancement of > >Women in Science" European initiative. The Institute seeks to maintain a > >principle of open competition (on the basis of merit) to appoint the > >best candidate, provides equal opportunity for all candidates, and > >equally encourages both genders to apply. > > > >APB > >-- > >Idiap Research Institute | tel: (41 27) 721 7729 > >Centre du Parc, CP 592 | fax: (41 27) 721 7712 > >CH-1920 Martigny | [email protected] > >Switzerland | www.idiap.ch/~apbelis > >_______________________________________________ > >Mt-list mailing list > >[email protected] > >http://mailhost.computing.dcu.ie/mailman/listinfo/mt-list > -- Idiap Research Institute | tel: (41 27) 721 7729 Centre du Parc, CP 592 | fax: (41 27) 721 7712 CH-1920 Martigny | [email protected] Switzerland | www.idiap.ch/~apbelis _______________________________________________ Mt-list mailing list [email protected] http://mailhost.computing.dcu.ie/mailman/listinfo/mt-list
