Would love to hear inputs from others. I am working on a low-resource
Chavacano corpus too.

On Wed, Mar 21, 2018 at 1:29 AM, Petro ORYNYCZ-GLEASON <[email protected]
> wrote:

> Dear Colleagues,
> We are using Moses to revitalize Lemko, an endangered low-resource
> language. We have 70,000 Lemko words in 3,387 segments perfectly
> translated into native English and perfectly aligned.
> Current BLEU score is about 0.10.
> As far as hardware goes, we're using the cloud: Amazon EC2 p2.xlarge
> (1 GPU, 4 vCPus, 61 GiB RAM).
> Questions:
> - How divide our precious 3,387 bilingual segments into training,
> tuning, and testing data? What ratio is ideal?
> - Considering that at this point, bilingual content is much dearer to
> us than processing power (Amazon AWS costs us USD 0.90 per hour, while
> translation costs us USD 0.15 per word), how do we make the most of
> what we've got?
> - Is there anything we could do other than the default settings that
> might lead to a large improvement in the BLEU score?
>
> Current training model:
> ~/workspace/mosesdecoder/scripts/training/train-model.perl \
> --parallel --mgiza-cpus 4 \
> -root-dir train \
> --corpus ~/corpus/train.ru-en.clean \
> --f ru --e en \
> --alignment grow-diag-final-and \
> --reordering msd-bidirectional-fe \
> --lm 0:3:/home/ubuntu/lm/train.ru-en.blm.en:8 \
> -external-bin-dir ~/workspace/bin/training-tools/mgizapp
>
> Current tuning model:
> ~/workspace/mosesdecoder/scripts/training/mert-moses.pl \
> ~/corpus/tune.ru-en.true.ru ~/corpus/tune.ru-en.true.en \
> ~/workspace/mosesdecoder/bin/moses ~/working/train/model/moses.ini
> --mertdir ~/workspace/mosesdecoder/bin/ \
> --decoder-flags="-threads 4"
>
> Thanks for your help!
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to