Hi I delete all the files (I think) generated during a training job before rerunning the entire training. You think this could cause variation? Here's the commands I run to delete:
rm ~/corpus/train.tok.en rm ~/corpus/train.tok.sm rm ~/corpus/train.true.en rm ~/corpus/train.true.sm rm ~/corpus/train.clean.en rm ~/corpus/train.clean.sm rm ~/corpus/truecase-model.en rm ~/corpus/truecase-model.sm rm ~/corpus/test.tok.en rm ~/corpus/test.tok.sm rm ~/corpus/test.true.en rm ~/corpus/test.true.sm rm -rf ~/working/filtered-test rm ~/working/test.out rm ~/working/test.translated.en rm ~/working/training.out rm -rf ~/working/train/corpus rm -rf ~/working/train/giza.en-sm rm -rf ~/working/train/giza.sm-en rm -rf ~/working/train/model On 22 June 2015 at 03:35, Marcin Junczys-Dowmunt <[email protected]> wrote: > You're welcome. Take another close look at those varying bleu scores > though. That would make me worry if it happened to me for the same data and > the same weights. > > On 22.06.2015 10:31, Hokage Sama wrote: > >> Ok thanks. Appreciate your help. >> >> On 22 June 2015 at 03:22, Marcin Junczys-Dowmunt <[email protected] >> <mailto:[email protected]>> wrote: >> >> Difficult to tell with that little data. Once you get beyond >> 100,000 segments (or 50,000 at least) i would say 2000 per dev >> (for tuning) and test set, rest for training. With that few >> segments it's hard to give you any recommendations since it might >> just not give meaningful results. It's currently a toy model, good >> for learning and playing around with options. But not good for >> trying to infer anything from BLEU scores. >> >> >> On 22.06.2015 10 <tel:22.06.2015%2010>:17, Hokage Sama wrote: >> >> Yes the language model was built earlier when I first went >> through the manual to build a French-English baseline system. >> So I just reused it for my Samoan-English system. >> Yes for all three runs I used the same training and testing files. >> How can I determine how much parallel data I should set aside >> for tuning and testing? I have only 10,028 segments (198,385 >> words) altogether. At the moment I'm using 259 segments for >> testing and the rest for training. >> >> Thanks, >> Hilton >> >> >> >> >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
