Re: [Moses-support] BLEU Score Variance: Which score to use?

Hokage Sama Mon, 22 Jun 2015 15:28:56 -0700

Hi I delete all the files (I think) generated during a training job before
rerunning the entire training. You think this could cause variation? Here's
the commands I run to delete:


rm ~/corpus/train.tok.en
rm ~/corpus/train.tok.sm
rm ~/corpus/train.true.en
rm ~/corpus/train.true.sm
rm ~/corpus/train.clean.en
rm ~/corpus/train.clean.sm
rm ~/corpus/truecase-model.en
rm ~/corpus/truecase-model.sm
rm ~/corpus/test.tok.en
rm ~/corpus/test.tok.sm
rm ~/corpus/test.true.en
rm ~/corpus/test.true.sm
rm -rf ~/working/filtered-test
rm ~/working/test.out
rm ~/working/test.translated.en
rm ~/working/training.out
rm -rf ~/working/train/corpus
rm -rf ~/working/train/giza.en-sm
rm -rf ~/working/train/giza.sm-en
rm -rf ~/working/train/model

On 22 June 2015 at 03:35, Marcin Junczys-Dowmunt <[email protected]> wrote:

> You're welcome. Take another close look at those varying bleu scores
> though. That would make me worry if it happened to me for the same data and
> the same weights.
>
> On 22.06.2015 10:31, Hokage Sama wrote:
>
>> Ok thanks. Appreciate your help.
>>
>> On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     Difficult to tell with that little data. Once you get beyond
>>     100,000 segments (or 50,000 at least) i would say 2000 per dev
>>     (for tuning) and test set, rest for training. With that few
>>     segments it's hard to give you any recommendations since it might
>>     just not give meaningful results. It's currently a toy model, good
>>     for learning and playing around with options. But not good for
>>     trying to infer anything from BLEU scores.
>>
>>
>>     On 22.06.2015 10 <tel:22.06.2015%2010>:17, Hokage Sama wrote:
>>
>>         Yes the language model was built earlier when I first went
>>         through the manual to build a French-English baseline system.
>>         So I just reused it for my Samoan-English system.
>>         Yes for all three runs I used the same training and testing files.
>>         How can I determine how much parallel data I should set aside
>>         for tuning and testing? I have only 10,028 segments (198,385
>>         words) altogether. At the moment I'm using 259 segments for
>>         testing and the rest for training.
>>
>>         Thanks,
>>         Hilton
>>
>>
>>
>>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] BLEU Score Variance: Which score to use?

Reply via email to