Re: [Moses-support] Training on larger sample yields terrible result

Per Tunedal Wed, 03 Apr 2013 06:39:16 -0700

Hi,
thanks for your advice. Cleaning the working directory did the trick.
Unfortunately, now the model is too large: I cannot translate in
reasonable time as the model doesn't fit in memory. The swap-file is
really slow.
Now it's time for pruning.
Yours,
Per Tunedal


On Wed, Apr 3, 2013, at 9:40, Barry Haddow wrote:
> Hi Per
> 
> You get these warnings:
> 
> "has alignment point (15, 19) out of bounds (15, WARNING: sentence
> 2448049)"
> 
> when your alignments don't match your corpus. Most likely you have 
> accidentally reused alignments from another run. The fast training is 
> also a sign that something went wrong.
> 
> Try again with a clean working directory.
> 
> cheers - Barry
> 
> 
> On 03/04/13 07:43, Per Tunedal wrote:
> > Hi,
> > Inspired by the paper "Does more data always yield better translations?"
> > @ aclweb.org/anthology-new/E/E12/E12-1016.pdf, that Ken Fasano kindly
> > linked to, I've experimented a great deal.
> >
> > I've tested several ways to pick a good sample of sentences from  the
> > Europarl corpus, picking 10 % of the sentences. I just thought I've
> > found a promising method and decided to pick a larger sample, 35 %, and
> > expected a very much improved translation. On the contrary, the
> > translation of my test-text was terrible. It was turned into garbage.
> > Completely useless.
> >
> > I trained the phrase model with:
> > nohup nice ~/mosesdecoder/scripts/training/train-model.perl -root-dir
> > train -corpus ~/corpora/Total1.sv-fr.clean_urval -f sv -e fr -alignment
> > grow-diag-final-and -reordering msd-bidirectional-fe -lm
> > 0:3:$HOME/lm/Total1.blm.fr:8 -external-bin-dir ~/mosesdecoder/tools
> > -parallel -cores 4 -score-options --GoodTuring >& training.out &
> >
> > The training was incredibly fast, in spite of the larger training
> > corpus.
> > After the line stating that moses.ini was created I found lots of
> > warnings of the type:
> > "has alignment point (15, 19) out of bounds (15, WARNING: sentence
> > 2448049)"
> >
> > Further the model (= the model folder) is very small: 277 MB,
> > phrase-table.gz: 83 MB.
> > The previous training with the same sample method (only 10% of the
> > Eurorparl) yielded: 495 MB phrase-table.gz: 173 MB
> >
> > Why this strange result? I suppose it has something to do with how the
> > phrases actually are extracted by Moses. The simple explanation "phrases
> > that are consistent with the word alignment" doesn't tell me enough.
> > Besides, I don't fully understand what it means. Maybe a very simple
> > example would make me understand the process.
> >
> > Yours,
> > Per Tunedal
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Training on larger sample yields terrible result

Reply via email to