Hi Hieu (and all),

I have created a branch called mert-sge-nosync, and upload the scripts for
doing MERT in SGE no-sync mode. I have done git add, git commit (not sure I
have to push though). Altogether 12 files are added.

scripts/generic/moses-parallel-sge-nosync.pl
scripts/generic/qsub-wrapper-exit-sge-nosync.pl
scripts/generic/qsub-wrapper-sge-nosync.pl
scripts/training/mert-moses-sge-nosync.pl
scripts/training/sge-nosync/cleartmpfiles.pl
scripts/training/sge-nosync/create-config-sge-nosync.pl
scripts/training/sge-nosync/moses-parallel-postdecode-sge-nosync.pl
scripts/training/sge-nosync/poll-decoder.pl
scripts/training/sge-nosync/process-featlist-sge-nosync.pl
scripts/training/sge-nosync/process-moses-result-sge-nosync.pl
scripts/training/sge-nosync/run-decoder-sge-nosync.pl
scripts/training/sge-nosync/zipextract-decoder-result.pl

You will need some SSH packages for perl to run these scripts too. I have
written up some details on
http://staffwww.dcs.shef.ac.uk/people/W.Ng/MERT.html

Thanks
raymond


On 27 April 2014 00:32, Hieu Hoang <[email protected]> wrote:

> it looks like there is interest in your changes. You should add it and let
> other people play.
>
> I'm not sure what's the best way to add it, scripts are sometimes a little
> fragile. Probably best if you add it to a separate directory or a
> branch,rather than try to change or replace what's there. It'll also be
> good if you can stay on the moses mailing list after you've added in case
> of trouble
>
>
> On 18 April 2014 21:59, Raymond W. M. Ng <[email protected]> wrote:
>
>> Hi all,
>>
>> Thanks for the reply. I will try creating a branch in the git repository
>> later.
>>
>> Ondrej> I have reasonably reliable disk access on my side so there are
>> almost no failure cases. I did recall some job failures in the course of
>> the development, when a newly created process (in machine B) happened to
>> take an existing process id (from an earlier job in machine A) and then
>> they crashed. I got around by appending the start time to the process id as
>> well... anyway, I kept the "retry for 5 times" in moses-parallel-decode.
>> When one of the splits fails, the parallel-decode is incomplete and
>> translation will restart until there are 5 times of failure. I haven't
>> tested what would happen after 5 times of failure simply because of the
>> hardware setting i mentioned above.
>>
>> I am new to the moses codes so when I started modifying them, I kept the
>> original and renamed the related files by putting an suffix -sge-nosyl
>> (e.g. qsub-wrapper-qsub-nosync.pl). There are in total 12 perl scripts
>> and they are distributed in $MOSES/scripts/training and
>> $MOSES/scripts/generic.
>>
>> Best
>> raymond
>>
>>
>>
>>
>>
>> On 16 April 2014 21:44, Ondrej Bojar <[email protected]> wrote:
>>
>>> Hi, Raymond.
>>>
>>> Interesting. Is your parallelization also tolerant to random job
>>> failures? How does it decide when to stop waiting? Cannot it degrade to
>>> optimizing only on eg. one of the splits, all others failing?
>>>
>>> An option to commit your code in a more visible way is to put it in the
>>> main branch under a different name, if the change affects just one script.
>>> But I agree it's not very nice and clean.
>>>
>>> Cheers, O.
>>>
>>> On April 16, 2014 5:38:35 PM CEST, Hieu Hoang <[email protected]>
>>> wrote:
>>> >hi raymond
>>> >
>>> >you're welcome to create a branch on the Moses github repository and
>>> >add
>>> >your code there. It's unlikely anyone will look at it or use it, but at
>>> >least it won't get lost.
>>> >
>>> >Maybe in future, you or someone else might want to merge it with the
>>> >master
>>> >branch where it will get much more exposure
>>> >
>>> >
>>> >On 16 April 2014 16:21, Raymond W. M. Ng <[email protected]> wrote:
>>> >
>>> >> Dear Ondrej,
>>> >>
>>> >> I have checked with Hieu when I met him in February, seems that the
>>> >SGE
>>> >> submission in MERT is using the -sync mode, which makes submission
>>> >> difficult (user still in submission states until the all jobs end).
>>> >In
>>> >> short, the modification runs in a "no-sync" mode.
>>> >>
>>> >> In terms of efficiency, as for the reasons you have mentioned, the
>>> >> combined wallclock time of N machines (N times actual program
>>> >runtime) may
>>> >> be longer than the single-threaded execution. But as in a lot of
>>> >shared
>>> >> computing environments, a single-threaded job execution for 20+ hours
>>> >is
>>> >> not favourable (sometimes disallowed). So by using the parallel mode,
>>> >> runtime of single jobs is shortened. In my experience, using parallel
>>> >mode
>>> >> shortens the tuning time from 30hours (single-threaded) to 3 hours
>>> >(20
>>> >> threads on different machines). We are having infiniband access among
>>> >nodes
>>> >> and its a bit more sophisticated than NFS mounting though.
>>> >>
>>> >> best
>>> >> raymond
>>> >>
>>> >>
>>> >> On 16 April 2014 13:48, Ondrej Bojar <[email protected]> wrote:
>>> >>
>>> >>> Dear Raymond,
>>> >>>
>>> >>> The existing scripts always allowed running MERT in parallel jobs on
>>> >SGE,
>>> >>> one just had to use generic/moses-parallel as the "moses
>>> >executable".
>>> >>>
>>> >>> Is there some other functionality that your modifications now bring?
>>> >>>
>>> >>> Btw, in my experience, parallelization into SGE jobs can be even
>>> >less
>>> >>> efficient than single-job-multi-threaded execution. It is hard to
>>> >exactly
>>> >>> describe the circumstances, but in general if your models are big
>>> >and
>>> >>> loaded from NFS, and you run many experiments at the same time, the
>>> >>> slowdown of the network multiplied by the many SGE jobs makes the
>>> >>> parallelization much more wasteful and sometimes slower (in
>>> >wallclock time).
>>> >>>
>>> >>> Cheers, Ondrej.
>>> >>>
>>> >>> On April 16, 2014 1:07:37 PM CEST, "Raymond W. M. Ng" <
>>> >>> [email protected]> wrote:
>>> >>> >Hi Moses support,
>>> >>> >
>>> >>> >Not sure I am making this enquiry in the right mailing list....
>>> >>> >I have some modified scripts for parallel MERT tuning which can run
>>> >on
>>> >>> >SGE.
>>> >>> >Now I would like to share this. It is based on an old version of
>>> >moses
>>> >>> >(around April 2012), what is the best way for sharing?
>>> >>> >
>>> >>> >Best
>>> >>> >raymond
>>> >>> >
>>> >>> >
>>> >>>
>>>
>>> >>------------------------------------------------------------------------
>>> >>> >
>>> >>> >_______________________________________________
>>> >>> >Moses-support mailing list
>>> >>> >[email protected]
>>> >>> >http://mailman.mit.edu/mailman/listinfo/moses-support
>>> >>>
>>> >>> --
>>> >>> Ondrej Bojar (mailto:[email protected] / [email protected])
>>> >>> http://www.cuni.cz/~obo
>>> >>>
>>> >>>
>>> >>
>>> >> _______________________________________________
>>> >> Moses-support mailing list
>>> >> [email protected]
>>> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> >>
>>> >>
>>>
>>> --
>>> Ondrej Bojar (mailto:[email protected] / [email protected])
>>> http://www.cuni.cz/~obo
>>>
>>>
>>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to