Hi Tom Yes, this could be caused by unusual characters in the corpus. If you search for 118049 in the vocab file (target side - es.vcb I think) then you may see what the problem is.
cheers - Barry On 20/05/13 14:40, Tom Hoar wrote: > The train-model.perl script from Beta 0.91 configured for MGIZA++ failed > on step 2.1b in the reverse direction with the error below. I think this > might be a result of inadequate cleaning. Can anyone confirm this or > offer an alternate reason? Thanks. > > m5p0 = -1 (fixed value for parameter p_0 in IBM-5 (if negative then it > is determined in training)) > manlexfactor1 = 0 () > manlexfactor2 = 0 () > manlexmaxmultiplicity = 20 () > maxfertility = 10 (maximal fertility for fertility models) > ncpus = 1 (Number of threads to be executed, use 0 if you just want all > CPUs to be used) > p0 = 0.999 (fixed value for parameter p_0 in IBM-3/4 (if negative then > it is determined in training)) > pegging = 0 (0: no pegging; 1: do pegging) > reading vocabulary files > Reading vocabulary file > from:/opt/domy/TRAININGS/alignments/align-dell2_full-en-es/giza.classes/en.vcb > Reading vocabulary file > from:/opt/domy/TRAININGS/alignments/align-dell2_full-en-es/giza.classes/es.vcb > Source vocabulary list has 85970 unique tokens > Target vocabulary list has 84643 unique tokens > Calculating vocabulary frequencies from corpus > /opt/domy/TRAININGS/alignments/align-dell2_full-en-es/giza.classes/es-en-int-train.snt > Reading more sentence pairs into memory ... > ERROR: target word 118049 is not in the vocabulary list > Exit code: 255 > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
