Yee Seng Chan wrote:
> However, when I tried to parallelize it by submitting say.. 10  
> jobs, I don’t get faster MERT iterations. In fact, it’s slower.  
> Sometimes, a job can be stuck on one of the grid nodes and after  
> hours, it’s still not completed. Its corresponding output-file e.g…  
> out.job32010-aa doesn’t get updated as well.
>
> Was wondering whether I could get any feedback to see what’s  
> slowing things down, or why the job gets stuck.
>
> I’m using grid version < 6.0, so I used the old-sge option. I’ve  
> also tried with/without free_mem=0.5G ; but doesn’t seem to make  
> much of a difference.
>

In our experience, the SGE option works quite well - I don't know if  
we'd be able to use Moses without it.  We allow 100 jobs at once.   
Are you sure 0.5G is sufficient?  We use 1G:

   --queue-flags='-hard -l mem_free=1G' --jobs=100 --nbest=40

I think I remember seeing behavior like yours when I tried lowering  
the memory limit, but we train 2-3 million sentences pairs, and the  
models are correspondingly large.  Half a gig may indeed be  
sufficient for you, but if you have any grid nodes with more memory,  
I'd try raising that limit.

Are your SGE jobs in an error state, or just proceeding very slowly?

Hope this helps a bit.

- John Burger
   MITRE
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to