Fixed. There was indeed an extra space in a token in the devel corpus, at line 1791, close to the end of the file, not in the phrase table.
Thanks to Alexander for pointing me into the right direction. -- Beppe On 4/14/2011 10:27, Alexander Fraser wrote: > If you checked for control-characters as well, then I'd next write a > quick script to parse the left-hand-side of the filtered phrase table > and look for the malformed line (you are looking for a token with no | > symbol). > > It would be good if Moses just told you which line that is. The code > to do this is commented out in Phrase.cpp, line 176, this could be > because it isn't general enough to work with all of the different > phrase table data structures. > > Cheers, Alex > > > On Thu, Apr 14, 2011 at 9:52 AM, Giuseppe Attardi<[email protected]> wrote: >> Good guess, but there is no | in the corpus: >> >> grep -c '|' europarl.it europarl.en >> europarl.it:0 >> europarl.en:0 >> >> Greetings from Pisa. >> >> -- Beppe >> >> On 4/14/2011 09:11, Alexander Fraser wrote: >>> Hi Beppe, >>> >>> This error probably means you have a malformed phrase table. >>> >>> Look for a pipe character or control characters in your training data >>> (the parallel corpus you estimated the phrase table from) and replace >>> them. >>> >>> Greetings from Stuttgart, Alex >>> >>> >>> On Thu, Apr 14, 2011 at 8:23 AM, Giuseppe Attardi<[email protected]> >>> wrote: >>>> I trained a factored model with input form and pos factors. >>>> However the decoder dies during tuning with this message >>>> >>>> Translating line 0 in thread id 47090098366736 >>>> Translating: questa|DD nostra|A dichiarazione|S dei|EA diritti|S è|V >>>> la|RD prima|NO del|EA millennio|S .|FS >>>> >>>> Collecting options took 0.010 seconds >>>> [ERROR] Malformed input at >>>> Expected input to have words composed of 2 factor(s) (form >>>> FAC1|FAC2|...) >>>> but instead received input with 1 factor(s). >>>> sh: line 1: 30773 Aborted /MT/tools/bin/moses -config >>>> filtered/moses.ini -inputtype 0 -w -0.217391 -lm 0.108696 -d 0.065217 >>>> 0.065217 0.065217 0.065217 0.065217 0.065217 0.065217 -tm 0.043478 >>>> 0.043478 0.043478 0.043478 0.043478 -n-best-list run1.best100.out 100 >>>> -input-file /MT/model/apr01pos/corpora/devel.it> run1.out >>>> Exit code: 134 >>>> >>>> The input is the same that was successfully handled by a smaller model >>>> built on a portion of the same data: >>>> >>>> Translating line 0 in thread id 47376215517456 >>>> Translating: questa|DD nostra|A dichiarazione|S dei|EA diritti|S è|V >>>> la|RD prima|NO del|EA millennio|S .|FS >>>> >>>> Collecting options took 0.000 seconds >>>> Search took 0.120 seconds >>>> BEST TRANSLATION: this nostra|A|UNK|UNK explanation of diritti|S|UNK|UNK >>>> is the first millennium bug . [11111111111] [total=-206.509]<<0.000, >>>> -11.000, -200.000, -54.388, -6.922, -9.213, -1.653, -9.136, 6.999>> 0-0 >>>> reset caches >>>> Translation took 0.130 seconds >>>> >>>> -- Beppe >>>> >> _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
