Hi Per You get these warnings:
"has alignment point (15, 19) out of bounds (15, WARNING: sentence 2448049)" when your alignments don't match your corpus. Most likely you have accidentally reused alignments from another run. The fast training is also a sign that something went wrong. Try again with a clean working directory. cheers - Barry On 03/04/13 07:43, Per Tunedal wrote: > Hi, > Inspired by the paper "Does more data always yield better translations?" > @ aclweb.org/anthology-new/E/E12/E12-1016.pdf, that Ken Fasano kindly > linked to, I've experimented a great deal. > > I've tested several ways to pick a good sample of sentences from the > Europarl corpus, picking 10 % of the sentences. I just thought I've > found a promising method and decided to pick a larger sample, 35 %, and > expected a very much improved translation. On the contrary, the > translation of my test-text was terrible. It was turned into garbage. > Completely useless. > > I trained the phrase model with: > nohup nice ~/mosesdecoder/scripts/training/train-model.perl -root-dir > train -corpus ~/corpora/Total1.sv-fr.clean_urval -f sv -e fr -alignment > grow-diag-final-and -reordering msd-bidirectional-fe -lm > 0:3:$HOME/lm/Total1.blm.fr:8 -external-bin-dir ~/mosesdecoder/tools > -parallel -cores 4 -score-options --GoodTuring >& training.out & > > The training was incredibly fast, in spite of the larger training > corpus. > After the line stating that moses.ini was created I found lots of > warnings of the type: > "has alignment point (15, 19) out of bounds (15, WARNING: sentence > 2448049)" > > Further the model (= the model folder) is very small: 277 MB, > phrase-table.gz: 83 MB. > The previous training with the same sample method (only 10% of the > Eurorparl) yielded: 495 MB phrase-table.gz: 173 MB > > Why this strange result? I suppose it has something to do with how the > phrases actually are extracted by Moses. The simple explanation "phrases > that are consistent with the word alignment" doesn't tell me enough. > Besides, I don't fully understand what it means. Maybe a very simple > example would make me understand the process. > > Yours, > Per Tunedal > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
