Hi, I'd strongly recommend to look at experiment.perl since this would reduce any small mistakes with typing in commands.
In your case there is a mismatch between the word alignment file that you produced and the corpus that it should apply to. This may be due to extra spaces, mismatches due to sentence filtering, failure to run the clean-corpus script, etc. If you want to debug it the hard way, then >> alignment point (28,29) out of range (0-14,0-9) in line 468517, ignoring tells you quite a bit: The problem is sentence 468517, and the alignment has an alignment point (28,29) which is not possible since source/target only has 15/10 words. What is the GIZA++ alignment for this sentence? What is the source and target sentence? What is the symmetrized alignment? -phi On Tue, Dec 11, 2012 at 10:58 PM, Cuong Hoang <[email protected]> wrote: > Here is my list of commands for running: > > rm -r ./model > rm ./giza.en-fr/* > rm ./giza.fr-en/* > rm -r ./corpus > > /home/cuongh/mosesdecoder/scripts/training/train-model.perl -mgiza > -mgiza-cpus 24 -cores 24 -parallel -sort-buffer-size 10G -sort-batch-size > 1021 -sort-compress gzip -sort-parallel 24 -root-dir > /home/cuongh/mosesdecoder -corpus > /home/cuongh/mosesdecoder/corpus.lowercased -f en -e fr -alignment > grow-diag-final-and -reordering msd-bidirectional-fe -giza-option > m1=5,m2=3,mh=0,m3=3,m4=0 > -external /home/cuongh/CODE/giza-pp -lm > 0:3:/home/cuongh/mosesdecoder/corpus.lowercased.fr.lm > > I guess the runing commands are okie. The problem is the corpus. > So, please suggest me some of errors which many people here maybe face with > the training data which could be the problem. > Thanks and best regards, > C. Hoang > On Tue, Dec 11, 2012 at 11:43 PM, Cuong Hoang <[email protected]> > wrote: >> >> Hi all, >> I train MOSES on the task of a little bit noisy. It means around 90%-95% >> pairs are bilingual pairs. I have to face with quite disturbing errors. >> >> When I use GIZA++, I'm stuck with the errors, such as: >> alignment point (28,29) out of range (0-14,0-9) in line 468517, ignoring >> alignment point (30,31) out of range (0-14,0-9) in line 468517, ignoring >> alignment point (31,31) out of range (0-14,0-9) in line 468517, ignoring >> alignment point (31,32) out of range (0-14,0-9) in line 468517, ignoring >> >> and the next results, for example: >> >> WARNING: sentence 415878 has alignment point (6, 7) out of bounds (7, 7) >> T: øksnes là một đô_thị ở hạt nordland >> S: øksnes is a municipality in nordland county >> >> Otherwise, when I also use MGIZA++ instead of GIZA++, i am stuck in the >> errors such as: >> >> Use of uninitialized value in substitution ... >> Use of uninitialized value $a in split >> or >> Use of uninitialized value $a in scalar chomp at >> scripts/training/LexicalTranslationModel.pm >> >> If you can, please give me any suggestion for this problem? >> Tks and best regards, >> C. Hoang >> -- >> Best Regards, >> C. Hoang >> SMT Nerd >> > > > > -- > Best Regards, > C. Hoang > SMT Nerd > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
