I sent this to the irstlm list, but also include it here in case this
team has some comments. 

Hieu recently checked in changes to the
build-lm.sh script to run the splits in parallel. About 6 months ago, we
replaced IRSTLM's shell script with a Python wrapper to give us more
control in our environment. We also prepared to multi-process the
splits. We stopped work because of concerns that parallel processing
might overloading system RAM resources. 

As we know, building LM's is
memory intensive. Without the parallel processing, each serialized split
can use 100% of the host's RAM, but the extra CPU cores sit idle.
Parallel processing uses all CPU's, but each CPU competes for RAM
resources. 

        * Is the final result of a build identical if you build
with one chunk or 3 splits or 30 splits?
        * Are there any
advantages/disadvantages to use a large number of splits with a queue
manager so-as to only parallel process up to the max number of CPU's and
reduce the RAM requirements with more but smaller splits?
        * Has anyone
experimented with other ways to reduce the RAM requirement for each
process while still allowing them to run in parallel?

Tom 

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to