i've never done it. There's a limit on the maximum number of shards when parallelizing extract and scoring during training.
I've upped the limit (from 99,999) to 9,999,999 https://github.com/moses-smt/mosesdecoder/commit/f95a1bb75b2add5b7dcd1e3e5c76777f2f141e21 Other than that, I can't think of any other issues. To minimize disk space usage (and probably increase speed too), compress the intermediate training files. Also, optimize the sorting. This is my arguments to train-model.perl to do these things ..../train-model.perl -sort-buffer-size 1G -sort-batch-size 253 -sort-compress gzip -cores 8 On 20 June 2014 09:50, Tom Hoar <[email protected]> wrote: > Does anyone have experience (words-of-wisdom) training the translation > model from a parallel corpus with 2.25 trillion phrase pairs and over 45 > trillion tokens? > > Thanks, > Tom > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
