OK.
Running EMS on SGE went through. Last glitch xml twig tools need to be 
installed on nodes.

Key points to memorize :
SGE : have the same user / password on all nodes/master, have Java-JRE 
installed on nodes/master
DO NOT use SMB to mount the sharepoint, use NFS

Last thing I can't figure out :
How does Moses steps deal with "Nb of Jobs submitted" versus -threads in 
the various steps ?


Le 29/10/2015 17:45, Vincent Nguyen a écrit :
> Ken,
>
> I just did some further testing on the master node that HAS all installed.
> same error as is.
>
> /netshr/mosesdecoder/bin/lmplz --text 
> /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa 
> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T 
> /netshr/working-en-fr/lm -S 20%
>
> /netshr is a mounting point of /home/moses/
>
> so what I did is that I replaced /netshr/ by /home/moses/
> first 2 instances => same error
>
> if I replace in the -T option /netshr by /home/moses
> it works.
>
> so obviously there is an issue here
>
>
>
> Le 29/10/2015 17:31, Kenneth Heafield a écrit :
>> So we're clear, it runs correctly on the local machine but not when you
>> run it through SGE?  In that case, I suspect it's library version
>> differences.
>>
>> On 10/29/2015 03:09 PM, Vincent Nguyen wrote:
>>> I get this error :
>>>
>>> moses@sgenode1:/netshr/working-en-fr$ /netshr/mosesdecoder/bin/lmplz
>>> --text /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa
>>> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T
>>> /netshr/working-en-fr/lm -S 20%
>>> === 1/5 Counting and sorting n-grams ===
>>> Reading /netshr/working-en-fr/lm/europarl.truecased.7
>>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>>> tcmalloc: large alloc 2755821568 bytes == 0x25d28000 @
>>> ****************************************************************************************************
>>> Segmentation fault (core dumped)
>>> moses@sgenode1:/netshr/working-en-fr$
>>>
>>> I installed the libgoogle-pertools-dev but same error.
>>> Just to be clear, all these packages below are just necessary to build
>>> Moses, do I need specific packages
>>> to run one or other binary ?
>>> confused....
>>>
>>>
>>>         Ubuntu
>>>
>>> Install the following packages using the command
>>>
>>>      sudo apt-get install [package name]
>>>
>>> Packages:
>>>
>>>      g++
>>>      git
>>>      subversion
>>>      automake
>>>      libtool
>>>      zlib1g-dev
>>>      libboost-all-dev
>>>      libbz2-dev
>>>      liblzma-dev
>>>      python-dev
>>>      graphviz
>>>      imagemagick
>>>      libgoogle-perftools-dev (for tcmalloc)
>>>
>>>
>>>
>>>
>>>
>>>
>>> Le 29/10/2015 15:18, Philipp Koehn a écrit :
>>>> Hi,
>>>>
>>>> make sure that all the paths are valid on all the nodes --- so
>>>> definitely no relative paths.
>>>> And of course, the binaries need to be executable on all nodes as well.
>>>>
>>>> -phi
>>>>
>>>> On Thu, Oct 29, 2015 at 10:12 AM, Vincent Nguyen <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>>
>>>>       OK guys, not an easy stuff ...
>>>>       I fought to get the prerequisites working but but now at least
>>>>       jobs start .....
>>>>
>>>>       and crash.
>>>>
>>>>       I'll post later the details of the preliminary steps, could be 
>>>> useful.
>>>>
>>>>       my crash is when lplmz starts.
>>>>
>>>>       I have a sharepoint mounted on my nodes and all bin are well seen
>>>>       from the nodes, including the lplmz program.
>>>>
>>>>       but I was thinking, do I need to actually install some packages on
>>>>       the nodes themselves ? I mean packages that do not fall under
>>>>       /mosesdecoder/ folder ?
>>>>
>>>>
>>>>       thanks,
>>>>
>>>>       V
>>>>
>>>>
>>>>
>>>>       Le 29/10/2015 13:26, Philipp Koehn a écrit :
>>>>>       Hi,
>>>>>
>>>>>       these machine names are just there for convenience.
>>>>>
>>>>>       If you want experiment.perl to submit jobs per qsub,
>>>>>       all you have to do is to run experiment.perl with the
>>>>>       additional switch "-cluster".
>>>>>
>>>>>       You can also put the head node's name into the
>>>>>       experiment.machines file, then you do not need to
>>>>>       use the switch anymore.
>>>>>
>>>>>       -phi
>>>>>
>>>>>       On Wed, Oct 28, 2015 at 10:20 AM, Vincent Nguyen <[email protected]
>>>>>       <mailto:[email protected]>> wrote:
>>>>>
>>>>>           Hi there,
>>>>>
>>>>>           I need some clarification before screwing up  some files.
>>>>>           I just setup a SGE cluster with a Master + 2 Nodes.
>>>>>
>>>>>           to make it clear let say my cluster name is "default", my master
>>>>>           headnode is "master", my 2 other nodes are "node1" and "node2"
>>>>>
>>>>>
>>>>>           for EMS :
>>>>>
>>>>>           I opened the default experiment.machines file and I see :
>>>>>
>>>>>           cluster: townhill seville hermes lion seville sannox lutzow
>>>>>           frontend
>>>>>           multicore-4: freddie
>>>>>           multicore-8: tyr thor odin crom
>>>>>           multicore-16: saxnot vali vili freyja bragi hoenir
>>>>>           multicore-24: syn hel skaol saga buri loki sif magni
>>>>>           multicore-32: gna snotra lofn thrud
>>>>>
>>>>>           townhill and others are what ? name machines / nodes ? name
>>>>>           of several
>>>>>           clusters ?
>>>>>           should I just put "default" or "master node1 node2" ?
>>>>>
>>>>>           multicore-X: should I put machine names here
>>>>>           if my 3 machines are 8 cores each
>>>>>           multicore-8: master node1 node2
>>>>>           right ?
>>>>>
>>>>>
>>>>>           then in the config file for EMS:
>>>>>
>>>>>           #generic-parallelizer =
>>>>>           $moses-script-dir/ems/support/generic-parallelizer.perl
>>>>>           #generic-parallelizer =
>>>>>           
>>>>> $moses-script-dir/ems/support/generic-multicore-parallelizer.perl
>>>>>
>>>>>           which one should  take if my nodes are multicore ? still the
>>>>>           first one ?
>>>>>
>>>>>
>>>>>           ### cluster settings (if run on a cluster machine)
>>>>>           # number of jobs to be submitted in parallel
>>>>>           #
>>>>>           #jobs = 10
>>>>>           should I count approx 1 job per core on the total cores of my
>>>>>           3 machines ?
>>>>>
>>>>>           # arguments to qsub when scheduling a job
>>>>>           #qsub-settings = ""
>>>>>           can this stay empty ?
>>>>>
>>>>>           # project for priviledges and usage accounting
>>>>>           #qsub-project = iccs_smt
>>>>>           standard value ?
>>>>>
>>>>>           # memory and time
>>>>>           #qsub-memory = 4
>>>>>           #qsub-hours = 48
>>>>>           4 what ? GB ?
>>>>>
>>>>>           ### multi-core settings
>>>>>           # when the generic parallelizer is used, the number of cores
>>>>>           # specified here
>>>>>           cores = 4
>>>>>           is this ignored if generic-parallelizer.perl is chosen ?
>>>>>
>>>>>
>>>>>           is there a way to put more load on one specific node ?
>>>>>
>>>>>           Many thanks,
>>>>>           V.
>>>>>
>>>>>
>>>>>           _______________________________________________
>>>>>           Moses-support mailing list
>>>>>           [email protected] <mailto:[email protected]>
>>>>>           http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to