Hi, I fail to use the xml-input flag. More precisely, the translation I provide in the XML markup is ignored (and the markup is discarded).
Translating 'das ist ein <n translation='yoyo'>kleines</n> haus' , I expected to obtain 'this is a yoyo house' with the option -xml-input exclusive (I also tried using the historical 'english' XML attribute) Can someone tell me what I do wrong or explain what is going on? I tried with the sample_model discussed in the user guide p 21 (http://www.statmt.org/moses/download/sample-models.tgz ) and a model of mine as well. I'm using the Cygwin pre-compiled version of Moses 1.0 downloaded on Jan 29th . BTW is there a way to have the decoder showing its version? Thank you! JL echo 'das ist ein <n translation='yoyo'>kleines</n> haus' | /c/moses10/bin/moses -f phrase-model/moses.ini -xml-input exclusive Defined parameters (per moses.ini or switch): config: phrase-model/moses.ini input-factors: 0 lmodel-file: 8 0 3 lm/europarl.srilm.gz mapping: T 0 n-best-list: nbest.txt 100 ttable-file: 0 0 0 1 phrase-model/phrase-table ttable-limit: 10 weight-d: 1 weight-l: 1 weight-t: 1 weight-w: 0 xml-input: exclusive /c/moses10/bin ScoreProducer: Distortion start: 0 end: 1 ScoreProducer: WordPenalty start: 1 end: 2 ScoreProducer: !UnknownWordPenalty start: 2 end: 3 Loading lexical distortion models...have 0 models Start loading LanguageModel lm/europarl.srilm.gz : [0.000] seconds ScoreProducer: LM start: 3 end: 4 Loading the LM will be faster if you build a binary file. Reading lm/europarl.srilm.gz ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **The ARPA file is missing <unk>. Substituting log10 probability -100.000. ************************************************************************************************** Finished loading LanguageModels : [1.061] seconds Start loading PhraseTable phrase-model/phrase-table : [1.061] seconds filePath: phrase-model/phrase-table ScoreProducer: PhraseModel start: 4 end: 5 Finished loading phrase tables : [1.061] seconds Start loading phrase table from phrase-model/phrase-table : [1.061] seconds Reading phrase-model/phrase-table ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **************************************************************************************************** Finished loading phrase tables : [1.063] seconds IO from STDOUT/STDIN Created input-output object : [1.063] seconds Translating line 0 in thread id 0x80047030 Translating: das ist ein kleines haus Line 0: Collecting options took 0.000 seconds Line 0: Search took 0.002 seconds this is a small house BEST TRANSLATION: this is a small house [11111] [total=-28.923] core=(0.000,-5.000,0.000,-27.091,-1.833) Line 0: Translation took 0.007 seconds total user 1.045 sys 0.031 VmRSS: 34560 kB echo 'das ist ein <n english='yoyo'>kleines</n> haus' | /c/moses10/bin/moses -f phrase-model/moses.ini -xml-input exclusive Defined parameters (per moses.ini or switch): config: phrase-model/moses.ini input-factors: 0 lmodel-file: 8 0 3 lm/europarl.srilm.gz mapping: T 0 n-best-list: nbest.txt 100 ttable-file: 0 0 0 1 phrase-model/phrase-table ttable-limit: 10 weight-d: 1 weight-l: 1 weight-t: 1 weight-w: 0 xml-input: exclusive /c/moses10/bin ScoreProducer: Distortion start: 0 end: 1 ScoreProducer: WordPenalty start: 1 end: 2 ScoreProducer: !UnknownWordPenalty start: 2 end: 3 Loading lexical distortion models...have 0 models Start loading LanguageModel lm/europarl.srilm.gz : [0.000] seconds ScoreProducer: LM start: 3 end: 4 Loading the LM will be faster if you build a binary file. Reading lm/europarl.srilm.gz ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **The ARPA file is missing <unk>. Substituting log10 probability -100.000. ************************************************************************************************** Finished loading LanguageModels : [1.050] seconds Start loading PhraseTable phrase-model/phrase-table : [1.050] seconds filePath: phrase-model/phrase-table ScoreProducer: PhraseModel start: 4 end: 5 Finished loading phrase tables : [1.050] seconds Start loading phrase table from phrase-model/phrase-table : [1.051] seconds Reading phrase-model/phrase-table ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **************************************************************************************************** Finished loading phrase tables : [1.052] seconds IO from STDOUT/STDIN Created input-output object : [1.052] seconds Translating line 0 in thread id 0x80047030 Translating: das ist ein kleines haus Line 0: Collecting options took 0.000 seconds Line 0: Search took 0.002 seconds this is a small house BEST TRANSLATION: this is a small house [11111] [total=-28.923] core=(0.000,-5.000,0.000,-27.091,-1.833) Line 0: Translation took 0.008 seconds total user 1.060 sys 0.015 VmRSS: 34560 kB exclusive Only the XML-specified translation is used for the input phrase. Any phrases from the phrase table that overlap with that span are ignored. Jean-Luc Meunier │ Senior Research Engineer │ Xerox Research Centre Europe│ 6 chemin de Maupertuis 38240 MEYLAN │ +33 (0)4 76 61 50 18
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
