Hi Tom

I added some extra debug and I get the following error:

[ERROR] Malformed input: '|'
In '.voto en contra de la resolución b6-0067 | 2004 del parlamento europeo 
sobre los procedimientos de ratificación del tratado por el que se establece 
una constitución para europa y la estrategia de comunicación relativa a dicho 
tratado .'
  Expected input to have words composed of 1 factor(s) (form FAC1|FAC2|...)
  but instead received input with 0 factor(s).
Aborted

This is at line 2230 in your input file, and now it's clear what the problem 
is - a stray pipe which moses is interpreting as a factor delimiter.

It seems that if threads are enabled then moses will read in and queue the 
whole input file at start up. This is not generally a problem as the input 
files we use are normally only a few thousand sentences, but it explains why 
the error was much further down the file than expected. I'll check in the 
extra debug code because it should be quite useful in this context. Getting 
the line number would be useful too, but would require more work,

cheers - Barry

On Tuesday 28 June 2011 15:59, Tom Hoar wrote:
> I'm tuning a new ES-EN translation model. The tables were trained
> with about 1.75 million pairs from the Europarl v6 data using Moses
> w/KenLM SVN rev 4011 and IRSTLM 5.60.03. The attachments herewith
> include the run1.moses.ini file and the output log from mert-moses.pl
> that also includes the command line.
>
> If I run from a terminal command
> line:
>
> "$ moses -f run1.moses.ini < mert.es > run0.out"
>
> Moses
> terminates with the same error in the mert-moses.pl.log file. Piping any
> other file into moses as above also terminates with the same error. I
> also removed the [threads] value to run single threaded, and again, same
> terminal error.
>
> If I run in a terminal:
>
> "$ moses -f run1.moses.ini"
>
>
> then, copy lines from the mert.es file and paste into the terminal,
> they translate fine.
>
> Also, three days ago, a tuning/training session
> with the same moses build competed fine. It used different training
> corpus started from the same data and used clean-corpus-n.perl with max
> tokens = 78. This corpus uses max tokens = 65 and extracted a different
> 2500 pairs for tuning. Those are the only differences in the two
> training corpora.
>
> I'm baffled. Any suggestions?
>
> Tom

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to