Hi Philipp,

Unfortunately I don't have a precise measurement.  If anyone knows of a
good way to benchmark a process tree with lots of memory mapping the same
files, I would be glad to run it.

--Michael

On Mon, Oct 5, 2015 at 10:26 AM, Philipp Koehn <[email protected]> wrote:

> Hi,
>
> great - that will be very useful.
>
> Since you just ran the comparison - do you have any numbers on "still
> allowed everything to fit into memory", i.e., how much more memory is used
> by running parallel instances?
>
> -phi
>
> On Mon, Oct 5, 2015 at 10:15 AM, Michael Denkowski <
> [email protected]> wrote:
>
>> Hi all,
>>
>> Like some other Moses users, I noticed diminishing returns from running
>> Moses with several threads.  To work around this, I added a script to run
>> multiple single-threaded instances of moses instead of one multi-threaded
>> instance.  In practice, this sped things up by about 2.5x for 16 cpus and
>> using memory mapped models still allowed everything to fit into memory.
>>
>> If anyone else is interested in using this, you can prefix a moses
>> command with scripts/generic/multi_moses.py.  To use multiple instances in
>> mert-moses.pl, specify --multi-moses and control the number of parallel
>> instances with --decoder-flags='-threads N'.
>>
>> Below is a benchmark on WMT fr-en data (2M training sentences, 400M words
>> mono, suffix array PT, compact reordering, 5-gram KenLM) testing default
>> stack decoding vs cube pruning without and with the parallelization script
>> (+multi):
>>
>> ---
>> 1cpu   sent/sec
>> stack      1.04
>> cube       2.10
>> ---
>> 16cpu  sent/sec
>> stack      7.63
>> +multi    12.20
>> cube       7.63
>> +multi    18.18
>> ---
>>
>> --Michael
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to