I'll mount NFS instead and will confirm if working. thanks
Le 29/10/2015 21:31, Kenneth Heafield a écrit : > Hi, > > The way I do temporary files is mkstemp, unlink, and then use them. > That way the kernel will still clean up if the process meets an untimely > death. > > Given that this issue appears on a SAMBA filesystem (aka SMB) but not > on a POSIX filesystem, I'm guessing it has to do with SAMBA > infelicities. Like this old bug: > https://bugzilla.samba.org/show_bug.cgi?id=998 . > > I'd like to make it work, but temporary files on SAMBA is pretty low > priority. However, if you can provide a backtrace (after compiling with > "debug" added to the command) I can try to turn that segfault into an > error message. > > Kenneth > > On 10/29/2015 08:15 PM, Vincent Nguyen wrote: >> it's the same machine in my last test ... >> >> let me explain : >> >> Master = Ubuntu 14.04 which is my original machine with Moses and all my >> other langue tools in /home/moses >> >> I shared with a smb sharepoint /home/moses as "mosesshare" >> >> then on 2 new nodes, I mounted /netshr on smb://master/mosesshare/ >> >> did the same on master >> >> so cd/netshr shows the content of /home/moses absolutely perfectly on >> the 3 machines (master and 2 nodes) >> >> I think you should ne able to replicate without having to handle sge or >> nodes. Just on 1 machine. >> >> >> Le 29/10/2015 20:59, Kenneth Heafield a écrit : >>> Yes. >>> >>> Also this is all very odd. What file system is /netshr ? >>> >>> On 10/29/2015 07:56 PM, Vincent Nguyen wrote: >>>> Hi, >>>> Do you think in the meantime can I just use -T with a local temporary >>>> directory ? >>>> >>>> -------- Message transféré -------- >>>> Sujet : Re: [Moses-support] Moses on SGE clarification >>>> Date : Thu, 29 Oct 2015 17:45:01 +0100 >>>> De : Vincent Nguyen <[email protected]> >>>> Pour : [email protected] >>>> >>>> >>>> >>>> Ken, >>>> >>>> I just did some further testing on the master node that HAS all >>>> installed. >>>> same error as is. >>>> >>>> /netshr/mosesdecoder/bin/lmplz --text >>>> /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa >>>> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T >>>> /netshr/working-en-fr/lm -S 20% >>>> >>>> /netshr is a mounting point of /home/moses/ >>>> >>>> so what I did is that I replaced /netshr/ by /home/moses/ >>>> first 2 instances => same error >>>> >>>> if I replace in the -T option /netshr by /home/moses >>>> it works. >>>> >>>> so obviously there is an issue here >>>> >>>> >>>> >>>> Le 29/10/2015 17:31, Kenneth Heafield a écrit : >>>>> So we're clear, it runs correctly on the local machine but not when you >>>>> run it through SGE? In that case, I suspect it's library version >>>>> differences. >>>>> >>>>> On 10/29/2015 03:09 PM, Vincent Nguyen wrote: >>>>>> I get this error : >>>>>> >>>>>> moses@sgenode1:/netshr/working-en-fr$ /netshr/mosesdecoder/bin/lmplz >>>>>> --text /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa >>>>>> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T >>>>>> /netshr/working-en-fr/lm -S 20% >>>>>> === 1/5 Counting and sorting n-grams === >>>>>> Reading /netshr/working-en-fr/lm/europarl.truecased.7 >>>>>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 >>>>>> >>>>>> tcmalloc: large alloc 2755821568 bytes == 0x25d28000 @ >>>>>> **************************************************************************************************** >>>>>> >>>>>> Segmentation fault (core dumped) >>>>>> moses@sgenode1:/netshr/working-en-fr$ >>>>>> >>>>>> I installed the libgoogle-pertools-dev but same error. >>>>>> Just to be clear, all these packages below are just necessary to build >>>>>> Moses, do I need specific packages >>>>>> to run one or other binary ? >>>>>> confused.... >>>>>> >>>>>> >>>>>> Ubuntu >>>>>> >>>>>> Install the following packages using the command >>>>>> >>>>>> sudo apt-get install [package name] >>>>>> >>>>>> Packages: >>>>>> >>>>>> g++ >>>>>> git >>>>>> subversion >>>>>> automake >>>>>> libtool >>>>>> zlib1g-dev >>>>>> libboost-all-dev >>>>>> libbz2-dev >>>>>> liblzma-dev >>>>>> python-dev >>>>>> graphviz >>>>>> imagemagick >>>>>> libgoogle-perftools-dev (for tcmalloc) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Le 29/10/2015 15:18, Philipp Koehn a écrit : >>>>>>> Hi, >>>>>>> >>>>>>> make sure that all the paths are valid on all the nodes --- so >>>>>>> definitely no relative paths. >>>>>>> And of course, the binaries need to be executable on all nodes as >>>>>>> well. >>>>>>> >>>>>>> -phi >>>>>>> >>>>>>> On Thu, Oct 29, 2015 at 10:12 AM, Vincent Nguyen <[email protected] >>>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >>>>>>> OK guys, not an easy stuff ... >>>>>>> I fought to get the prerequisites working but but now at least >>>>>>> jobs start ..... >>>>>>> >>>>>>> and crash. >>>>>>> >>>>>>> I'll post later the details of the preliminary steps, could >>>>>>> be useful. >>>>>>> >>>>>>> my crash is when lplmz starts. >>>>>>> >>>>>>> I have a sharepoint mounted on my nodes and all bin are well >>>>>>> seen >>>>>>> from the nodes, including the lplmz program. >>>>>>> >>>>>>> but I was thinking, do I need to actually install some >>>>>>> packages on >>>>>>> the nodes themselves ? I mean packages that do not fall under >>>>>>> /mosesdecoder/ folder ? >>>>>>> >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> V >>>>>>> >>>>>>> >>>>>>> >>>>>>> Le 29/10/2015 13:26, Philipp Koehn a écrit : >>>>>>>> Hi, >>>>>>>> >>>>>>>> these machine names are just there for convenience. >>>>>>>> >>>>>>>> If you want experiment.perl to submit jobs per qsub, >>>>>>>> all you have to do is to run experiment.perl with the >>>>>>>> additional switch "-cluster". >>>>>>>> >>>>>>>> You can also put the head node's name into the >>>>>>>> experiment.machines file, then you do not need to >>>>>>>> use the switch anymore. >>>>>>>> >>>>>>>> -phi >>>>>>>> >>>>>>>> On Wed, Oct 28, 2015 at 10:20 AM, Vincent Nguyen >>>>>>>> <[email protected] >>>> <mailto:[email protected]>> wrote: >>>>>>>> >>>>>>>> Hi there, >>>>>>>> >>>>>>>> I need some clarification before screwing up some files. >>>>>>>> I just setup a SGE cluster with a Master + 2 Nodes. >>>>>>>> >>>>>>>> to make it clear let say my cluster name is "default", >>>>>>>> my master >>>>>>>> headnode is "master", my 2 other nodes are "node1" and >>>>>>>> "node2" >>>>>>>> >>>>>>>> >>>>>>>> for EMS : >>>>>>>> >>>>>>>> I opened the default experiment.machines file and I see : >>>>>>>> >>>>>>>> cluster: townhill seville hermes lion seville sannox >>>>>>>> lutzow >>>>>>>> frontend >>>>>>>> multicore-4: freddie >>>>>>>> multicore-8: tyr thor odin crom >>>>>>>> multicore-16: saxnot vali vili freyja bragi hoenir >>>>>>>> multicore-24: syn hel skaol saga buri loki sif magni >>>>>>>> multicore-32: gna snotra lofn thrud >>>>>>>> >>>>>>>> townhill and others are what ? name machines / nodes ? >>>>>>>> name >>>>>>>> of several >>>>>>>> clusters ? >>>>>>>> should I just put "default" or "master node1 node2" ? >>>>>>>> >>>>>>>> multicore-X: should I put machine names here >>>>>>>> if my 3 machines are 8 cores each >>>>>>>> multicore-8: master node1 node2 >>>>>>>> right ? >>>>>>>> >>>>>>>> >>>>>>>> then in the config file for EMS: >>>>>>>> >>>>>>>> #generic-parallelizer = >>>>>>>> $moses-script-dir/ems/support/generic-parallelizer.perl >>>>>>>> #generic-parallelizer = >>>>>>>> >>>>>>>> $moses-script-dir/ems/support/generic-multicore-parallelizer.perl >>>>>>>> >>>>>>>> which one should take if my nodes are multicore ? >>>>>>>> still the >>>>>>>> first one ? >>>>>>>> >>>>>>>> >>>>>>>> ### cluster settings (if run on a cluster machine) >>>>>>>> # number of jobs to be submitted in parallel >>>>>>>> # >>>>>>>> #jobs = 10 >>>>>>>> should I count approx 1 job per core on the total cores >>>>>>>> of my >>>>>>>> 3 machines ? >>>>>>>> >>>>>>>> # arguments to qsub when scheduling a job >>>>>>>> #qsub-settings = "" >>>>>>>> can this stay empty ? >>>>>>>> >>>>>>>> # project for priviledges and usage accounting >>>>>>>> #qsub-project = iccs_smt >>>>>>>> standard value ? >>>>>>>> >>>>>>>> # memory and time >>>>>>>> #qsub-memory = 4 >>>>>>>> #qsub-hours = 48 >>>>>>>> 4 what ? GB ? >>>>>>>> >>>>>>>> ### multi-core settings >>>>>>>> # when the generic parallelizer is used, the number of >>>>>>>> cores >>>>>>>> # specified here >>>>>>>> cores = 4 >>>>>>>> is this ignored if generic-parallelizer.perl is chosen ? >>>>>>>> >>>>>>>> >>>>>>>> is there a way to put more load on one specific node ? >>>>>>>> >>>>>>>> Many thanks, >>>>>>>> V. >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Moses-support mailing list >>>>>>>> [email protected] <mailto:[email protected]> >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>>> >>>>>>>> >>>>>> _______________________________________________ >>>>>> Moses-support mailing list >>>>>> [email protected] >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>> >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> [email protected] >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>>> >>>> _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
