Re: [Moses-support] BLEU Score Variance: Which score to use?

Marcin Junczys-Dowmunt Mon, 22 Jun 2015 15:50:29 -0700

I don't think so. However, when you repeat those experiments, you might 
try to identify where two trainings are starting to diverge by pairwise 
comparisions of the same files between two runs. Maybe then we can 
deduce something.


On 23.06.2015 00:25, Hokage Sama wrote:
> Hi I delete all the files (I think) generated during a training job 
> before rerunning the entire training. You think this could cause 
> variation? Here's the commands I run to delete:
>
> rm ~/corpus/train.tok.en
> rm ~/corpus/train.tok.sm <http://train.tok.sm>
> rm ~/corpus/train.true.en
> rm ~/corpus/train.true.sm <http://train.true.sm>
> rm ~/corpus/train.clean.en
> rm ~/corpus/train.clean.sm <http://train.clean.sm>
> rm ~/corpus/truecase-model.en
> rm ~/corpus/truecase-model.sm <http://truecase-model.sm>
> rm ~/corpus/test.tok.en
> rm ~/corpus/test.tok.sm <http://test.tok.sm>
> rm ~/corpus/test.true.en
> rm ~/corpus/test.true.sm <http://test.true.sm>
> rm -rf ~/working/filtered-test
> rm ~/working/test.out
> rm ~/working/test.translated.en
> rm ~/working/training.out
> rm -rf ~/working/train/corpus
> rm -rf ~/working/train/giza.en-sm
> rm -rf ~/working/train/giza.sm-en
> rm -rf ~/working/train/model
>
> On 22 June 2015 at 03:35, Marcin Junczys-Dowmunt <junc...@amu.edu.pl 
> <mailto:junc...@amu.edu.pl>> wrote:
>
>     You're welcome. Take another close look at those varying bleu
>     scores though. That would make me worry if it happened to me for
>     the same data and the same weights.
>
>     On 22.06.2015 10 <tel:22.06.2015%2010>:31, Hokage Sama wrote:
>
>         Ok thanks. Appreciate your help.
>
>         On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt
>         <junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>
>         <mailto:junc...@amu.edu.pl <mailto:junc...@amu.edu.pl>>> wrote:
>
>             Difficult to tell with that little data. Once you get beyond
>             100,000 segments (or 50,000 at least) i would say 2000 per dev
>             (for tuning) and test set, rest for training. With that few
>             segments it's hard to give you any recommendations since
>         it might
>             just not give meaningful results. It's currently a toy
>         model, good
>             for learning and playing around with options. But not good for
>             trying to infer anything from BLEU scores.
>
>
>             On 22.06.2015 10 <tel:22.06.2015%2010>
>         <tel:22.06.2015%2010>:17, Hokage Sama wrote:
>
>                 Yes the language model was built earlier when I first went
>                 through the manual to build a French-English baseline
>         system.
>                 So I just reused it for my Samoan-English system.
>                 Yes for all three runs I used the same training and
>         testing files.
>                 How can I determine how much parallel data I should
>         set aside
>                 for tuning and testing? I have only 10,028 segments
>         (198,385
>                 words) altogether. At the moment I'm using 259
>         segments for
>                 testing and the rest for training.
>
>                 Thanks,
>                 Hilton
>
>
>
>
>

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] BLEU Score Variance: Which score to use?

Reply via email to