OK. Running EMS on SGE went through. Last glitch xml twig tools need to be installed on nodes.
Key points to memorize : SGE : have the same user / password on all nodes/master, have Java-JRE installed on nodes/master DO NOT use SMB to mount the sharepoint, use NFS Last thing I can't figure out : How does Moses steps deal with "Nb of Jobs submitted" versus -threads in the various steps ? Le 29/10/2015 17:45, Vincent Nguyen a écrit : > Ken, > > I just did some further testing on the master node that HAS all installed. > same error as is. > > /netshr/mosesdecoder/bin/lmplz --text > /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa > /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T > /netshr/working-en-fr/lm -S 20% > > /netshr is a mounting point of /home/moses/ > > so what I did is that I replaced /netshr/ by /home/moses/ > first 2 instances => same error > > if I replace in the -T option /netshr by /home/moses > it works. > > so obviously there is an issue here > > > > Le 29/10/2015 17:31, Kenneth Heafield a écrit : >> So we're clear, it runs correctly on the local machine but not when you >> run it through SGE? In that case, I suspect it's library version >> differences. >> >> On 10/29/2015 03:09 PM, Vincent Nguyen wrote: >>> I get this error : >>> >>> moses@sgenode1:/netshr/working-en-fr$ /netshr/mosesdecoder/bin/lmplz >>> --text /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa >>> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T >>> /netshr/working-en-fr/lm -S 20% >>> === 1/5 Counting and sorting n-grams === >>> Reading /netshr/working-en-fr/lm/europarl.truecased.7 >>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 >>> tcmalloc: large alloc 2755821568 bytes == 0x25d28000 @ >>> **************************************************************************************************** >>> Segmentation fault (core dumped) >>> moses@sgenode1:/netshr/working-en-fr$ >>> >>> I installed the libgoogle-pertools-dev but same error. >>> Just to be clear, all these packages below are just necessary to build >>> Moses, do I need specific packages >>> to run one or other binary ? >>> confused.... >>> >>> >>> Ubuntu >>> >>> Install the following packages using the command >>> >>> sudo apt-get install [package name] >>> >>> Packages: >>> >>> g++ >>> git >>> subversion >>> automake >>> libtool >>> zlib1g-dev >>> libboost-all-dev >>> libbz2-dev >>> liblzma-dev >>> python-dev >>> graphviz >>> imagemagick >>> libgoogle-perftools-dev (for tcmalloc) >>> >>> >>> >>> >>> >>> >>> Le 29/10/2015 15:18, Philipp Koehn a écrit : >>>> Hi, >>>> >>>> make sure that all the paths are valid on all the nodes --- so >>>> definitely no relative paths. >>>> And of course, the binaries need to be executable on all nodes as well. >>>> >>>> -phi >>>> >>>> On Thu, Oct 29, 2015 at 10:12 AM, Vincent Nguyen <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> >>>> OK guys, not an easy stuff ... >>>> I fought to get the prerequisites working but but now at least >>>> jobs start ..... >>>> >>>> and crash. >>>> >>>> I'll post later the details of the preliminary steps, could be >>>> useful. >>>> >>>> my crash is when lplmz starts. >>>> >>>> I have a sharepoint mounted on my nodes and all bin are well seen >>>> from the nodes, including the lplmz program. >>>> >>>> but I was thinking, do I need to actually install some packages on >>>> the nodes themselves ? I mean packages that do not fall under >>>> /mosesdecoder/ folder ? >>>> >>>> >>>> thanks, >>>> >>>> V >>>> >>>> >>>> >>>> Le 29/10/2015 13:26, Philipp Koehn a écrit : >>>>> Hi, >>>>> >>>>> these machine names are just there for convenience. >>>>> >>>>> If you want experiment.perl to submit jobs per qsub, >>>>> all you have to do is to run experiment.perl with the >>>>> additional switch "-cluster". >>>>> >>>>> You can also put the head node's name into the >>>>> experiment.machines file, then you do not need to >>>>> use the switch anymore. >>>>> >>>>> -phi >>>>> >>>>> On Wed, Oct 28, 2015 at 10:20 AM, Vincent Nguyen <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> Hi there, >>>>> >>>>> I need some clarification before screwing up some files. >>>>> I just setup a SGE cluster with a Master + 2 Nodes. >>>>> >>>>> to make it clear let say my cluster name is "default", my master >>>>> headnode is "master", my 2 other nodes are "node1" and "node2" >>>>> >>>>> >>>>> for EMS : >>>>> >>>>> I opened the default experiment.machines file and I see : >>>>> >>>>> cluster: townhill seville hermes lion seville sannox lutzow >>>>> frontend >>>>> multicore-4: freddie >>>>> multicore-8: tyr thor odin crom >>>>> multicore-16: saxnot vali vili freyja bragi hoenir >>>>> multicore-24: syn hel skaol saga buri loki sif magni >>>>> multicore-32: gna snotra lofn thrud >>>>> >>>>> townhill and others are what ? name machines / nodes ? name >>>>> of several >>>>> clusters ? >>>>> should I just put "default" or "master node1 node2" ? >>>>> >>>>> multicore-X: should I put machine names here >>>>> if my 3 machines are 8 cores each >>>>> multicore-8: master node1 node2 >>>>> right ? >>>>> >>>>> >>>>> then in the config file for EMS: >>>>> >>>>> #generic-parallelizer = >>>>> $moses-script-dir/ems/support/generic-parallelizer.perl >>>>> #generic-parallelizer = >>>>> >>>>> $moses-script-dir/ems/support/generic-multicore-parallelizer.perl >>>>> >>>>> which one should take if my nodes are multicore ? still the >>>>> first one ? >>>>> >>>>> >>>>> ### cluster settings (if run on a cluster machine) >>>>> # number of jobs to be submitted in parallel >>>>> # >>>>> #jobs = 10 >>>>> should I count approx 1 job per core on the total cores of my >>>>> 3 machines ? >>>>> >>>>> # arguments to qsub when scheduling a job >>>>> #qsub-settings = "" >>>>> can this stay empty ? >>>>> >>>>> # project for priviledges and usage accounting >>>>> #qsub-project = iccs_smt >>>>> standard value ? >>>>> >>>>> # memory and time >>>>> #qsub-memory = 4 >>>>> #qsub-hours = 48 >>>>> 4 what ? GB ? >>>>> >>>>> ### multi-core settings >>>>> # when the generic parallelizer is used, the number of cores >>>>> # specified here >>>>> cores = 4 >>>>> is this ignored if generic-parallelizer.perl is chosen ? >>>>> >>>>> >>>>> is there a way to put more load on one specific node ? >>>>> >>>>> Many thanks, >>>>> V. >>>>> >>>>> >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> [email protected] <mailto:[email protected]> >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> >>>>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
