Hi Tom

Yes, this could be caused by unusual characters in the corpus. If you 
search for 118049 in the vocab file (target side - es.vcb I think) then 
you may see what the problem is.

cheers - Barry

On 20/05/13 14:40, Tom Hoar wrote:
> The train-model.perl script from Beta 0.91 configured for MGIZA++ failed
> on step 2.1b in the reverse direction with the error below. I think this
> might be a result of inadequate cleaning. Can anyone confirm this or
> offer an alternate reason? Thanks.
>
> m5p0 = -1 (fixed value for parameter p_0 in IBM-5 (if negative then it
> is determined in training))
> manlexfactor1 = 0 ()
> manlexfactor2 = 0 ()
> manlexmaxmultiplicity = 20 ()
> maxfertility = 10 (maximal fertility for fertility models)
> ncpus = 1 (Number of threads to be executed, use 0 if you just want all
> CPUs to be used)
> p0 = 0.999 (fixed value for parameter p_0 in IBM-3/4 (if negative then
> it is determined in training))
> pegging = 0 (0: no pegging; 1: do pegging)
> reading vocabulary files
> Reading vocabulary file
> from:/opt/domy/TRAININGS/alignments/align-dell2_full-en-es/giza.classes/en.vcb
> Reading vocabulary file
> from:/opt/domy/TRAININGS/alignments/align-dell2_full-en-es/giza.classes/es.vcb
> Source vocabulary list has 85970 unique tokens
> Target vocabulary list has 84643 unique tokens
> Calculating vocabulary frequencies from corpus
> /opt/domy/TRAININGS/alignments/align-dell2_full-en-es/giza.classes/es-en-int-train.snt
> Reading more sentence pairs into memory ...
> ERROR: target word 118049 is not in the vocabulary list
> Exit code: 255
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to