Hi, thanks for your advice. Cleaning the working directory did the trick. Unfortunately, now the model is too large: I cannot translate in reasonable time as the model doesn't fit in memory. The swap-file is really slow. Now it's time for pruning. Yours, Per Tunedal
On Wed, Apr 3, 2013, at 9:40, Barry Haddow wrote: > Hi Per > > You get these warnings: > > "has alignment point (15, 19) out of bounds (15, WARNING: sentence > 2448049)" > > when your alignments don't match your corpus. Most likely you have > accidentally reused alignments from another run. The fast training is > also a sign that something went wrong. > > Try again with a clean working directory. > > cheers - Barry > > > On 03/04/13 07:43, Per Tunedal wrote: > > Hi, > > Inspired by the paper "Does more data always yield better translations?" > > @ aclweb.org/anthology-new/E/E12/E12-1016.pdf, that Ken Fasano kindly > > linked to, I've experimented a great deal. > > > > I've tested several ways to pick a good sample of sentences from the > > Europarl corpus, picking 10 % of the sentences. I just thought I've > > found a promising method and decided to pick a larger sample, 35 %, and > > expected a very much improved translation. On the contrary, the > > translation of my test-text was terrible. It was turned into garbage. > > Completely useless. > > > > I trained the phrase model with: > > nohup nice ~/mosesdecoder/scripts/training/train-model.perl -root-dir > > train -corpus ~/corpora/Total1.sv-fr.clean_urval -f sv -e fr -alignment > > grow-diag-final-and -reordering msd-bidirectional-fe -lm > > 0:3:$HOME/lm/Total1.blm.fr:8 -external-bin-dir ~/mosesdecoder/tools > > -parallel -cores 4 -score-options --GoodTuring >& training.out & > > > > The training was incredibly fast, in spite of the larger training > > corpus. > > After the line stating that moses.ini was created I found lots of > > warnings of the type: > > "has alignment point (15, 19) out of bounds (15, WARNING: sentence > > 2448049)" > > > > Further the model (= the model folder) is very small: 277 MB, > > phrase-table.gz: 83 MB. > > The previous training with the same sample method (only 10% of the > > Eurorparl) yielded: 495 MB phrase-table.gz: 173 MB > > > > Why this strange result? I suppose it has something to do with how the > > phrases actually are extracted by Moses. The simple explanation "phrases > > that are consistent with the word alignment" doesn't tell me enough. > > Besides, I don't fully understand what it means. Maybe a very simple > > example would make me understand the process. > > > > Yours, > > Per Tunedal > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
