Hi Philipp, Unfortunately I don't have a precise measurement. If anyone knows of a good way to benchmark a process tree with lots of memory mapping the same files, I would be glad to run it.
--Michael On Mon, Oct 5, 2015 at 10:26 AM, Philipp Koehn <[email protected]> wrote: > Hi, > > great - that will be very useful. > > Since you just ran the comparison - do you have any numbers on "still > allowed everything to fit into memory", i.e., how much more memory is used > by running parallel instances? > > -phi > > On Mon, Oct 5, 2015 at 10:15 AM, Michael Denkowski < > [email protected]> wrote: > >> Hi all, >> >> Like some other Moses users, I noticed diminishing returns from running >> Moses with several threads. To work around this, I added a script to run >> multiple single-threaded instances of moses instead of one multi-threaded >> instance. In practice, this sped things up by about 2.5x for 16 cpus and >> using memory mapped models still allowed everything to fit into memory. >> >> If anyone else is interested in using this, you can prefix a moses >> command with scripts/generic/multi_moses.py. To use multiple instances in >> mert-moses.pl, specify --multi-moses and control the number of parallel >> instances with --decoder-flags='-threads N'. >> >> Below is a benchmark on WMT fr-en data (2M training sentences, 400M words >> mono, suffix array PT, compact reordering, 5-gram KenLM) testing default >> stack decoding vs cube pruning without and with the parallelization script >> (+multi): >> >> --- >> 1cpu sent/sec >> stack 1.04 >> cube 2.10 >> --- >> 16cpu sent/sec >> stack 7.63 >> +multi 12.20 >> cube 7.63 >> +multi 18.18 >> --- >> >> --Michael >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
