If you checked for control-characters as well, then I'd next write a quick script to parse the left-hand-side of the filtered phrase table and look for the malformed line (you are looking for a token with no | symbol).
It would be good if Moses just told you which line that is. The code to do this is commented out in Phrase.cpp, line 176, this could be because it isn't general enough to work with all of the different phrase table data structures. Cheers, Alex On Thu, Apr 14, 2011 at 9:52 AM, Giuseppe Attardi <[email protected]> wrote: > Good guess, but there is no | in the corpus: > > grep -c '|' europarl.it europarl.en > europarl.it:0 > europarl.en:0 > > Greetings from Pisa. > > -- Beppe > > On 4/14/2011 09:11, Alexander Fraser wrote: >> >> Hi Beppe, >> >> This error probably means you have a malformed phrase table. >> >> Look for a pipe character or control characters in your training data >> (the parallel corpus you estimated the phrase table from) and replace >> them. >> >> Greetings from Stuttgart, Alex >> >> >> On Thu, Apr 14, 2011 at 8:23 AM, Giuseppe Attardi<[email protected]> >> wrote: >>> >>> I trained a factored model with input form and pos factors. >>> However the decoder dies during tuning with this message >>> >>> Translating line 0 in thread id 47090098366736 >>> Translating: questa|DD nostra|A dichiarazione|S dei|EA diritti|S è|V >>> la|RD prima|NO del|EA millennio|S .|FS >>> >>> Collecting options took 0.010 seconds >>> [ERROR] Malformed input at >>> Expected input to have words composed of 2 factor(s) (form >>> FAC1|FAC2|...) >>> but instead received input with 1 factor(s). >>> sh: line 1: 30773 Aborted /MT/tools/bin/moses -config >>> filtered/moses.ini -inputtype 0 -w -0.217391 -lm 0.108696 -d 0.065217 >>> 0.065217 0.065217 0.065217 0.065217 0.065217 0.065217 -tm 0.043478 >>> 0.043478 0.043478 0.043478 0.043478 -n-best-list run1.best100.out 100 >>> -input-file /MT/model/apr01pos/corpora/devel.it> run1.out >>> Exit code: 134 >>> >>> The input is the same that was successfully handled by a smaller model >>> built on a portion of the same data: >>> >>> Translating line 0 in thread id 47376215517456 >>> Translating: questa|DD nostra|A dichiarazione|S dei|EA diritti|S è|V >>> la|RD prima|NO del|EA millennio|S .|FS >>> >>> Collecting options took 0.000 seconds >>> Search took 0.120 seconds >>> BEST TRANSLATION: this nostra|A|UNK|UNK explanation of diritti|S|UNK|UNK >>> is the first millennium bug . [11111111111] [total=-206.509]<<0.000, >>> -11.000, -200.000, -54.388, -6.922, -9.213, -1.653, -9.136, 6.999>> 0-0 >>> reset caches >>> Translation took 0.130 seconds >>> >>> -- Beppe >>> > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
