[Moses-support] the best way to design the mapping steps for training the factored translation model(English = German)
Dear all, I want to train a morphological analysis and generation model for moses, based on which the further translation is from English to German. And I have prepared my training data like this: % tail -n 1 factored-corpus/proj-syndicate.?? == factored-corpus/proj-syndicate.en == corruption|corruption|nn flourishes|flourish|nns .|.|. == factored-corpus/proj-syndicate.de == korruption|korruption|nn|nn.fem.cas.sg floriert|florieren|vvfin|vvfin .|.|per|per Each word is not only represented by its surface form , but also with additional factors. And both the English factors and that of German are surface form,lemma,part of speech and morphy. And now I want to know the best way to design the mapping steps for training the factored translation model? Can you help me? BTW, I have designed a total of four mapping steps such as below(for your reference): % train-model.perl \ --corpus factored-corpus/…… \ --root-dir morphgen \ --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 \ --lm 2:3:factored-corpus/pos.lm:0 \ --translation-factors 1-1+2-2,3 \ --generation-factors 1-2+1,2,3-0 \ --decoding-steps t0,g0,t1,g1 \ The above way for designation followed the Turorial for Using Factored Models on the website: http://www.statmt.org/moses/?n=Moses.FactoredTutorial#ntoc4 Your kind suggestions will be greatly appreciated! Best Regards Henry ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
答复: [Moses-support] about Morph tagging
Thank you very much! BTW, I’m studying Morphisto now, which is a morphological analyzer for German. http://code.google.com/p/morphisto/ And maybe I will use relevant HFST's tools as morphological analyzer for other languages. Best Regards Henry -邮件原件- 发件人: Francis Tyers [mailto:fty...@prompsit.com] 发送时间: 2010年10月20日 18:13 收件人: JiaHongwei 抄送: moses-support@mit.edu 主题: Re: [Moses-support] about Morph tagging You could use the morphological analysers from the Apertium project. http://wiki.apertium.org/wiki/Using_an_lttoolbox_dictionary http://wiki.apertium.org/wiki/Lttoolbox http://wiki.apertium.org/wiki/HFST Fran El dc 20 de 10 de 2010 a les 17:58 +0800, en/na JiaHongwei va escriure: Hi, I need to train a model with POS tags and morphological information for Moses involving languages such as German, Spanish, French and Italian. By using TreeTagger, I can get POS tags in the format 'form pos lemma'. But I want it further processed to be like this, such as 'form pos lemma morph'. So the job is taking 'form pos lemma' as input and output in format 'form pos lemma morph'. Could you recommend a way or a tool to help me do this job automatically or in pipeline? Thanks in advance! Best Regards Henry ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
答复: [Moses-support] best pos tagger fo r tagging text in German, Italian, Spani sh, Italian
Thanks a lot! I’d like to have a trial of SVMTool. BTW, does it support generating morphological information? And which tools will you recommend if I want to use a part-of-speech tagger plus generated morphological information? Best Regards Henry _ 发件人: Jesús Giménez [mailto:jgime...@lsi.upc.edu] 发送时间: 2010年10月13日 17:53 收件人: JiaHongwei 抄送: moses-support@mit.edu 主题: Re: [Moses-support] best pos tagger for tagging text in German, Italian, Spanish, Italian hi Henry, I may also recommend you SVMTool, a state-of-the-art open source part-of-speech tagger (and generator of sequential taggers). Latest development version includes tagging models for English, Spanish and Catalan (both case-sensitive and case-insensitive models). Besides, in the short term we plan to include models for French, Romanian, Czech, Italian and German. * svn co svn://biniki.lsi.upc.edu/svmtool/trunk svmtool It's implemented in Perl. There is also a ~10 times faster C++ version (SVMTool++). Models are fully compatible. You might want to consider using it for massive text processing. * svn co svn://biniki.lsi.upc.edu/svmtool++/trunk svmtool++ -- jesus On 09/10/10 07:22, JiaHongwei wrote: Hi, I want to use factored model for translating text in German, Italian, Spanish, French to English using Moses and I noticed there's a pos tagging step before using factored model in decoding. Can u recommend me on good tagger for the translation I mentioned? And I also wonder if the pos tagging result can be better using a combined set of pos taggers? If so, then how can I do that? Your support will be really appreciated! Thanks a lot Henry ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support