Fixed. There was indeed an extra space in a token in the devel corpus,
at line 1791, close to the end of the file, not in the phrase table.

Thanks to Alexander for pointing me into the right direction.

-- Beppe

On 4/14/2011 10:27, Alexander Fraser wrote:
> If you checked for control-characters as well, then I'd next write a
> quick script to parse the left-hand-side of the filtered phrase table
> and look for the malformed line (you are looking for a token with no |
> symbol).
>
> It would be good if Moses just told you which line that is. The code
> to do this is commented out in Phrase.cpp, line 176, this could be
> because it isn't general enough to work with all of the different
> phrase table data structures.
>
> Cheers, Alex
>
>
> On Thu, Apr 14, 2011 at 9:52 AM, Giuseppe Attardi<[email protected]>  wrote:
>> Good guess, but there is no | in the corpus:
>>
>> grep -c '|' europarl.it europarl.en
>> europarl.it:0
>> europarl.en:0
>>
>> Greetings from Pisa.
>>
>> -- Beppe
>>
>> On 4/14/2011 09:11, Alexander Fraser wrote:
>>> Hi Beppe,
>>>
>>> This error probably means you have a malformed phrase table.
>>>
>>> Look for a pipe character or control characters in your training data
>>> (the parallel corpus you estimated the phrase table from) and replace
>>> them.
>>>
>>> Greetings from Stuttgart, Alex
>>>
>>>
>>> On Thu, Apr 14, 2011 at 8:23 AM, Giuseppe Attardi<[email protected]>
>>>   wrote:
>>>> I trained a factored model with input form and pos factors.
>>>> However the decoder dies during tuning with this message
>>>>
>>>> Translating line 0  in thread id 47090098366736
>>>> Translating: questa|DD nostra|A dichiarazione|S dei|EA diritti|S è|V
>>>> la|RD prima|NO del|EA millennio|S .|FS
>>>>
>>>> Collecting options took 0.010 seconds
>>>> [ERROR] Malformed input at
>>>>    Expected input to have words composed of 2 factor(s) (form
>>>> FAC1|FAC2|...)
>>>>    but instead received input with 1 factor(s).
>>>> sh: line 1: 30773 Aborted                 /MT/tools/bin/moses -config
>>>> filtered/moses.ini -inputtype 0 -w -0.217391 -lm 0.108696 -d 0.065217
>>>> 0.065217 0.065217 0.065217 0.065217 0.065217 0.065217 -tm 0.043478
>>>> 0.043478 0.043478 0.043478 0.043478 -n-best-list run1.best100.out 100
>>>> -input-file /MT/model/apr01pos/corpora/devel.it>    run1.out
>>>> Exit code: 134
>>>>
>>>> The input is the same that was successfully handled by a smaller model
>>>> built on a portion of the same data:
>>>>
>>>> Translating line 0  in thread id 47376215517456
>>>> Translating: questa|DD nostra|A dichiarazione|S dei|EA diritti|S è|V
>>>> la|RD prima|NO del|EA millennio|S .|FS
>>>>
>>>> Collecting options took 0.000 seconds
>>>> Search took 0.120 seconds
>>>> BEST TRANSLATION: this nostra|A|UNK|UNK explanation of diritti|S|UNK|UNK
>>>> is the first millennium bug . [11111111111]  [total=-206.509]<<0.000,
>>>> -11.000, -200.000, -54.388, -6.922, -9.213, -1.653, -9.136, 6.999>>    0-0
>>>> reset caches
>>>> Translation took 0.130 seconds
>>>>
>>>> -- Beppe
>>>>
>>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to