I will try to look for weird characters in the input. In the meantime here is something surprising. If I pass the first few lines of input from to the failing command, it works:
> head -20 /MT/model/apr01pos/corpora/devel.it > devel20.it > /MT/tools/bin/moses -config filtered/moses.ini -inputtype 0 -w -0.217391 -lm 0.108696 -d 0.065217 0.065217 0.065217 0.065217 0.065217 0.065217 0.065217 -tm 0.043478 0.043478 0.043478 0.043478 0.043478 -n-best-list run1.best100.out 100 < devel20.it ... Loading lexical distortion models...have 1 models Creating lexical reordering... weights: 0.065 0.065 0.065 0.065 0.065 0.065 Loading table into memory...done. Start loading LanguageModel /MT/corpora/lm/english.form.blm.mm : [118.000] seconds In LanguageModelIRST::Load: nGramOrder = 5 IRSTLM Language Model Type of /MT/corpora/lm/english.form.blm.mm is 1 Loading LM file (no MAP) blmt loadbin() mapping 986991 1-grams mapping 21801039 2-grams mapping 51632143 3-grams mapping 77826681 4-grams mapping 0 5-grams done OOV code is 986990 IRST: m_unknownId=986990 Finished loading LanguageModels : [122.000] seconds Start loading PhraseTable /MT/model/apr01pos/it-en/phrase-table : [122.000] seconds filePath: /MT/model/apr01pos/it-en/phrase-table Finished loading phrase tables : [122.000] seconds IO from STDOUT/STDIN Created input-output object : [122.000] seconds Translating line 0 in thread id 140457548929296 Translating: questa|DD nostra|A dichiarazione|S dei|EA diritti|S è|V la|RD prima|NO del|EA millennio|S .|FS reading bin ttable size of OFF_T 8 binary phrasefile loaded, default OFF_T: -1 Collecting options took 3.820 seconds Search took 4.620 seconds our Bill of Rights is the first of the millennium . BEST TRANSLATION: our Bill of Rights is the first of the millennium . [11111111111] [total=-7.288] <<0.000, -11.000, 0.000, -1.115, 0.000, 0.000, -2.413, 0.000, 0.000, -63.429, -13.782, -27.623, -4.490, -17.853, 4.999>> reset caches Translation took 4.640 seconds Finished translating ... -- Beppe On 4/14/2011 10:27, Alexander Fraser wrote: > If you checked for control-characters as well, then I'd next write a > quick script to parse the left-hand-side of the filtered phrase table > and look for the malformed line (you are looking for a token with no | > symbol). > > It would be good if Moses just told you which line that is. The code > to do this is commented out in Phrase.cpp, line 176, this could be > because it isn't general enough to work with all of the different > phrase table data structures. > > Cheers, Alex > > > On Thu, Apr 14, 2011 at 9:52 AM, Giuseppe Attardi<[email protected]> wrote: >> Good guess, but there is no | in the corpus: >> >> grep -c '|' europarl.it europarl.en >> europarl.it:0 >> europarl.en:0 >> >> Greetings from Pisa. >> >> -- Beppe >> >> On 4/14/2011 09:11, Alexander Fraser wrote: >>> Hi Beppe, >>> >>> This error probably means you have a malformed phrase table. >>> >>> Look for a pipe character or control characters in your training data >>> (the parallel corpus you estimated the phrase table from) and replace >>> them. >>> >>> Greetings from Stuttgart, Alex >>> >>> >>> On Thu, Apr 14, 2011 at 8:23 AM, Giuseppe Attardi<[email protected]> >>> wrote: >>>> I trained a factored model with input form and pos factors. >>>> However the decoder dies during tuning with this message >>>> >>>> Translating line 0 in thread id 47090098366736 >>>> Translating: questa|DD nostra|A dichiarazione|S dei|EA diritti|S è|V >>>> la|RD prima|NO del|EA millennio|S .|FS >>>> >>>> Collecting options took 0.010 seconds >>>> [ERROR] Malformed input at >>>> Expected input to have words composed of 2 factor(s) (form >>>> FAC1|FAC2|...) >>>> but instead received input with 1 factor(s). >>>> sh: line 1: 30773 Aborted /MT/tools/bin/moses -config >>>> filtered/moses.ini -inputtype 0 -w -0.217391 -lm 0.108696 -d 0.065217 >>>> 0.065217 0.065217 0.065217 0.065217 0.065217 0.065217 -tm 0.043478 >>>> 0.043478 0.043478 0.043478 0.043478 -n-best-list run1.best100.out 100 >>>> -input-file /MT/model/apr01pos/corpora/devel.it> run1.out >>>> Exit code: 134 >>>> >>>> The input is the same that was successfully handled by a smaller model >>>> built on a portion of the same data: >>>> >>>> Translating line 0 in thread id 47376215517456 >>>> Translating: questa|DD nostra|A dichiarazione|S dei|EA diritti|S è|V >>>> la|RD prima|NO del|EA millennio|S .|FS >>>> >>>> Collecting options took 0.000 seconds >>>> Search took 0.120 seconds >>>> BEST TRANSLATION: this nostra|A|UNK|UNK explanation of diritti|S|UNK|UNK >>>> is the first millennium bug . [11111111111] [total=-206.509]<<0.000, >>>> -11.000, -200.000, -54.388, -6.922, -9.213, -1.653, -9.136, 6.999>> 0-0 >>>> reset caches >>>> Translation took 0.130 seconds >>>> >>>> -- Beppe >>>> >> _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
