yeah, my changes runs all the splits at the same time. You're screwed if 
you don't have enough memory.

It's kinda the 1st version of parallelization which I haven't tested on 
real data yet. When I do, i'll let you know and we may have to tweak the 
script.

On 11/04/2012 17:24, Nicola Bertoldi wrote:
> Hi Tom
>
> I try to answer below
>
> On Apr 9, 2012, at 3:58 PM, Tom Hoar wrote:
>
>
> I sent this to the irstlm list, but also include it here in case this team 
> has some comments.
>
> Hieu recently checked in changes to the build-lm.sh script to run the splits 
> in parallel. About 6 months ago, we replaced IRSTLM's shell script with a 
> Python wrapper to give us more control in our environment. We also prepared 
> to multi-process the splits. We stopped work because of concerns that 
> parallel processing might overloading system RAM resources.
>
> I think that the Hieu change had the empirical assumption that the number of 
> split could not exceed the number of CPUs. And in any case using 
> parallelization we are not assured to run out-of-memory.
>
>
> As we know, building LM's is memory intensive. Without the parallel 
> processing, each serialized split can use 100% of the host's RAM, but the 
> extra CPU cores sit idle. Parallel processing uses all CPU's, but each CPU 
> competes for RAM resources.
>
>    1.  Is the final result of a build identical if you build with one chunk 
> or 3 splits or 30 splits?
>
> YES, the regression test build-lm-sublm2    check for that   (1 split vs 5 
> splits)
>
>
>    1.  Are there any advantages/disadvantages to use a large number of splits 
> with a queue manager so-as to only parallel process up to the max number of 
> CPU's and reduce the RAM requirements with more but smaller splits?
>
> The main rules to take into account are the following:
> - the smaller  the splits, the less RAM requirement (for the single split)
> - the larger the number of splits, the larger is the time for merging results 
>  (even this is not a very big issue)
>
> Hence, I think that, if a queue manager is available like that one you are 
> proposing,
> the best policy should be to use more but smaller splits.
>
> I am going to write such a manager, because I think it is a good enhancement 
> of IRSTLM toolokit
> Have you already something written in Python I can mimic in my scripts?
>
>
> The best tradeoff between number of splits (and hence RAM requirements) and 
> computation time should be found by means some experimentation,
> on different machine with different RAM size different number of threads, and 
> so on.
>
>    1.  Has anyone experimented with other ways to reduce the RAM requirement 
> for each process while still allowing them to run in parallel?
>
> No in FBK.
>
>
> Tom
>
>
> best regards,
> Nicola Bertoldi
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]<mailto:[email protected]>
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to