Hi Hieu (and all), I have created a branch called mert-sge-nosync, and upload the scripts for doing MERT in SGE no-sync mode. I have done git add, git commit (not sure I have to push though). Altogether 12 files are added.
scripts/generic/moses-parallel-sge-nosync.pl scripts/generic/qsub-wrapper-exit-sge-nosync.pl scripts/generic/qsub-wrapper-sge-nosync.pl scripts/training/mert-moses-sge-nosync.pl scripts/training/sge-nosync/cleartmpfiles.pl scripts/training/sge-nosync/create-config-sge-nosync.pl scripts/training/sge-nosync/moses-parallel-postdecode-sge-nosync.pl scripts/training/sge-nosync/poll-decoder.pl scripts/training/sge-nosync/process-featlist-sge-nosync.pl scripts/training/sge-nosync/process-moses-result-sge-nosync.pl scripts/training/sge-nosync/run-decoder-sge-nosync.pl scripts/training/sge-nosync/zipextract-decoder-result.pl You will need some SSH packages for perl to run these scripts too. I have written up some details on http://staffwww.dcs.shef.ac.uk/people/W.Ng/MERT.html Thanks raymond On 27 April 2014 00:32, Hieu Hoang <[email protected]> wrote: > it looks like there is interest in your changes. You should add it and let > other people play. > > I'm not sure what's the best way to add it, scripts are sometimes a little > fragile. Probably best if you add it to a separate directory or a > branch,rather than try to change or replace what's there. It'll also be > good if you can stay on the moses mailing list after you've added in case > of trouble > > > On 18 April 2014 21:59, Raymond W. M. Ng <[email protected]> wrote: > >> Hi all, >> >> Thanks for the reply. I will try creating a branch in the git repository >> later. >> >> Ondrej> I have reasonably reliable disk access on my side so there are >> almost no failure cases. I did recall some job failures in the course of >> the development, when a newly created process (in machine B) happened to >> take an existing process id (from an earlier job in machine A) and then >> they crashed. I got around by appending the start time to the process id as >> well... anyway, I kept the "retry for 5 times" in moses-parallel-decode. >> When one of the splits fails, the parallel-decode is incomplete and >> translation will restart until there are 5 times of failure. I haven't >> tested what would happen after 5 times of failure simply because of the >> hardware setting i mentioned above. >> >> I am new to the moses codes so when I started modifying them, I kept the >> original and renamed the related files by putting an suffix -sge-nosyl >> (e.g. qsub-wrapper-qsub-nosync.pl). There are in total 12 perl scripts >> and they are distributed in $MOSES/scripts/training and >> $MOSES/scripts/generic. >> >> Best >> raymond >> >> >> >> >> >> On 16 April 2014 21:44, Ondrej Bojar <[email protected]> wrote: >> >>> Hi, Raymond. >>> >>> Interesting. Is your parallelization also tolerant to random job >>> failures? How does it decide when to stop waiting? Cannot it degrade to >>> optimizing only on eg. one of the splits, all others failing? >>> >>> An option to commit your code in a more visible way is to put it in the >>> main branch under a different name, if the change affects just one script. >>> But I agree it's not very nice and clean. >>> >>> Cheers, O. >>> >>> On April 16, 2014 5:38:35 PM CEST, Hieu Hoang <[email protected]> >>> wrote: >>> >hi raymond >>> > >>> >you're welcome to create a branch on the Moses github repository and >>> >add >>> >your code there. It's unlikely anyone will look at it or use it, but at >>> >least it won't get lost. >>> > >>> >Maybe in future, you or someone else might want to merge it with the >>> >master >>> >branch where it will get much more exposure >>> > >>> > >>> >On 16 April 2014 16:21, Raymond W. M. Ng <[email protected]> wrote: >>> > >>> >> Dear Ondrej, >>> >> >>> >> I have checked with Hieu when I met him in February, seems that the >>> >SGE >>> >> submission in MERT is using the -sync mode, which makes submission >>> >> difficult (user still in submission states until the all jobs end). >>> >In >>> >> short, the modification runs in a "no-sync" mode. >>> >> >>> >> In terms of efficiency, as for the reasons you have mentioned, the >>> >> combined wallclock time of N machines (N times actual program >>> >runtime) may >>> >> be longer than the single-threaded execution. But as in a lot of >>> >shared >>> >> computing environments, a single-threaded job execution for 20+ hours >>> >is >>> >> not favourable (sometimes disallowed). So by using the parallel mode, >>> >> runtime of single jobs is shortened. In my experience, using parallel >>> >mode >>> >> shortens the tuning time from 30hours (single-threaded) to 3 hours >>> >(20 >>> >> threads on different machines). We are having infiniband access among >>> >nodes >>> >> and its a bit more sophisticated than NFS mounting though. >>> >> >>> >> best >>> >> raymond >>> >> >>> >> >>> >> On 16 April 2014 13:48, Ondrej Bojar <[email protected]> wrote: >>> >> >>> >>> Dear Raymond, >>> >>> >>> >>> The existing scripts always allowed running MERT in parallel jobs on >>> >SGE, >>> >>> one just had to use generic/moses-parallel as the "moses >>> >executable". >>> >>> >>> >>> Is there some other functionality that your modifications now bring? >>> >>> >>> >>> Btw, in my experience, parallelization into SGE jobs can be even >>> >less >>> >>> efficient than single-job-multi-threaded execution. It is hard to >>> >exactly >>> >>> describe the circumstances, but in general if your models are big >>> >and >>> >>> loaded from NFS, and you run many experiments at the same time, the >>> >>> slowdown of the network multiplied by the many SGE jobs makes the >>> >>> parallelization much more wasteful and sometimes slower (in >>> >wallclock time). >>> >>> >>> >>> Cheers, Ondrej. >>> >>> >>> >>> On April 16, 2014 1:07:37 PM CEST, "Raymond W. M. Ng" < >>> >>> [email protected]> wrote: >>> >>> >Hi Moses support, >>> >>> > >>> >>> >Not sure I am making this enquiry in the right mailing list.... >>> >>> >I have some modified scripts for parallel MERT tuning which can run >>> >on >>> >>> >SGE. >>> >>> >Now I would like to share this. It is based on an old version of >>> >moses >>> >>> >(around April 2012), what is the best way for sharing? >>> >>> > >>> >>> >Best >>> >>> >raymond >>> >>> > >>> >>> > >>> >>> >>> >>> >>------------------------------------------------------------------------ >>> >>> > >>> >>> >_______________________________________________ >>> >>> >Moses-support mailing list >>> >>> >[email protected] >>> >>> >http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >>> >>> -- >>> >>> Ondrej Bojar (mailto:[email protected] / [email protected]) >>> >>> http://www.cuni.cz/~obo >>> >>> >>> >>> >>> >> >>> >> _______________________________________________ >>> >> Moses-support mailing list >>> >> [email protected] >>> >> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> >>> >> >>> >>> -- >>> Ondrej Bojar (mailto:[email protected] / [email protected]) >>> http://www.cuni.cz/~obo >>> >>> >> > > > -- > Hieu Hoang > Research Associate > University of Edinburgh > http://www.hoang.co.uk/hieu > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
