Would love to hear inputs from others. I am working on a low-resource Chavacano corpus too.
On Wed, Mar 21, 2018 at 1:29 AM, Petro ORYNYCZ-GLEASON <[email protected] > wrote: > Dear Colleagues, > We are using Moses to revitalize Lemko, an endangered low-resource > language. We have 70,000 Lemko words in 3,387 segments perfectly > translated into native English and perfectly aligned. > Current BLEU score is about 0.10. > As far as hardware goes, we're using the cloud: Amazon EC2 p2.xlarge > (1 GPU, 4 vCPus, 61 GiB RAM). > Questions: > - How divide our precious 3,387 bilingual segments into training, > tuning, and testing data? What ratio is ideal? > - Considering that at this point, bilingual content is much dearer to > us than processing power (Amazon AWS costs us USD 0.90 per hour, while > translation costs us USD 0.15 per word), how do we make the most of > what we've got? > - Is there anything we could do other than the default settings that > might lead to a large improvement in the BLEU score? > > Current training model: > ~/workspace/mosesdecoder/scripts/training/train-model.perl \ > --parallel --mgiza-cpus 4 \ > -root-dir train \ > --corpus ~/corpus/train.ru-en.clean \ > --f ru --e en \ > --alignment grow-diag-final-and \ > --reordering msd-bidirectional-fe \ > --lm 0:3:/home/ubuntu/lm/train.ru-en.blm.en:8 \ > -external-bin-dir ~/workspace/bin/training-tools/mgizapp > > Current tuning model: > ~/workspace/mosesdecoder/scripts/training/mert-moses.pl \ > ~/corpus/tune.ru-en.true.ru ~/corpus/tune.ru-en.true.en \ > ~/workspace/mosesdecoder/bin/moses ~/working/train/model/moses.ini > --mertdir ~/workspace/mosesdecoder/bin/ \ > --decoder-flags="-threads 4" > > Thanks for your help! > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
