Hi Per

You get these warnings:

"has alignment point (15, 19) out of bounds (15, WARNING: sentence
2448049)"

when your alignments don't match your corpus. Most likely you have 
accidentally reused alignments from another run. The fast training is 
also a sign that something went wrong.

Try again with a clean working directory.

cheers - Barry


On 03/04/13 07:43, Per Tunedal wrote:
> Hi,
> Inspired by the paper "Does more data always yield better translations?"
> @ aclweb.org/anthology-new/E/E12/E12-1016.pdf, that Ken Fasano kindly
> linked to, I've experimented a great deal.
>
> I've tested several ways to pick a good sample of sentences from  the
> Europarl corpus, picking 10 % of the sentences. I just thought I've
> found a promising method and decided to pick a larger sample, 35 %, and
> expected a very much improved translation. On the contrary, the
> translation of my test-text was terrible. It was turned into garbage.
> Completely useless.
>
> I trained the phrase model with:
> nohup nice ~/mosesdecoder/scripts/training/train-model.perl -root-dir
> train -corpus ~/corpora/Total1.sv-fr.clean_urval -f sv -e fr -alignment
> grow-diag-final-and -reordering msd-bidirectional-fe -lm
> 0:3:$HOME/lm/Total1.blm.fr:8 -external-bin-dir ~/mosesdecoder/tools
> -parallel -cores 4 -score-options --GoodTuring >& training.out &
>
> The training was incredibly fast, in spite of the larger training
> corpus.
> After the line stating that moses.ini was created I found lots of
> warnings of the type:
> "has alignment point (15, 19) out of bounds (15, WARNING: sentence
> 2448049)"
>
> Further the model (= the model folder) is very small: 277 MB,
> phrase-table.gz: 83 MB.
> The previous training with the same sample method (only 10% of the
> Eurorparl) yielded: 495 MB phrase-table.gz: 173 MB
>
> Why this strange result? I suppose it has something to do with how the
> phrases actually are extracted by Moses. The simple explanation "phrases
> that are consistent with the word alignment" doesn't tell me enough.
> Besides, I don't fully understand what it means. Maybe a very simple
> example would make me understand the process.
>
> Yours,
> Per Tunedal
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to