Hi,
The way I do temporary files is mkstemp, unlink, and then use them.
That way the kernel will still clean up if the process meets an untimely
death.
Given that this issue appears on a SAMBA filesystem (aka SMB) but not
on a POSIX filesystem, I'm guessing it has to do with SAMBA
infelicities. Like this old bug:
https://bugzilla.samba.org/show_bug.cgi?id=998 .
I'd like to make it work, but temporary files on SAMBA is pretty low
priority. However, if you can provide a backtrace (after compiling with
"debug" added to the command) I can try to turn that segfault into an
error message.
Kenneth
On 10/29/2015 08:15 PM, Vincent Nguyen wrote:
>
> it's the same machine in my last test ...
>
> let me explain :
>
> Master = Ubuntu 14.04 which is my original machine with Moses and all my
> other langue tools in /home/moses
>
> I shared with a smb sharepoint /home/moses as "mosesshare"
>
> then on 2 new nodes, I mounted /netshr on smb://master/mosesshare/
>
> did the same on master
>
> so cd/netshr shows the content of /home/moses absolutely perfectly on
> the 3 machines (master and 2 nodes)
>
> I think you should ne able to replicate without having to handle sge or
> nodes. Just on 1 machine.
>
>
> Le 29/10/2015 20:59, Kenneth Heafield a écrit :
>> Yes.
>>
>> Also this is all very odd. What file system is /netshr ?
>>
>> On 10/29/2015 07:56 PM, Vincent Nguyen wrote:
>>> Hi,
>>> Do you think in the meantime can I just use -T with a local temporary
>>> directory ?
>>>
>>> -------- Message transféré --------
>>> Sujet : Re: [Moses-support] Moses on SGE clarification
>>> Date : Thu, 29 Oct 2015 17:45:01 +0100
>>> De : Vincent Nguyen <[email protected]>
>>> Pour : [email protected]
>>>
>>>
>>>
>>> Ken,
>>>
>>> I just did some further testing on the master node that HAS all
>>> installed.
>>> same error as is.
>>>
>>> /netshr/mosesdecoder/bin/lmplz --text
>>> /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa
>>> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T
>>> /netshr/working-en-fr/lm -S 20%
>>>
>>> /netshr is a mounting point of /home/moses/
>>>
>>> so what I did is that I replaced /netshr/ by /home/moses/
>>> first 2 instances => same error
>>>
>>> if I replace in the -T option /netshr by /home/moses
>>> it works.
>>>
>>> so obviously there is an issue here
>>>
>>>
>>>
>>> Le 29/10/2015 17:31, Kenneth Heafield a écrit :
>>>> So we're clear, it runs correctly on the local machine but not when you
>>>> run it through SGE? In that case, I suspect it's library version
>>>> differences.
>>>>
>>>> On 10/29/2015 03:09 PM, Vincent Nguyen wrote:
>>>>> I get this error :
>>>>>
>>>>> moses@sgenode1:/netshr/working-en-fr$ /netshr/mosesdecoder/bin/lmplz
>>>>> --text /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa
>>>>> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T
>>>>> /netshr/working-en-fr/lm -S 20%
>>>>> === 1/5 Counting and sorting n-grams ===
>>>>> Reading /netshr/working-en-fr/lm/europarl.truecased.7
>>>>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>>>>>
>>>>> tcmalloc: large alloc 2755821568 bytes == 0x25d28000 @
>>>>> ****************************************************************************************************
>>>>>
>>>>> Segmentation fault (core dumped)
>>>>> moses@sgenode1:/netshr/working-en-fr$
>>>>>
>>>>> I installed the libgoogle-pertools-dev but same error.
>>>>> Just to be clear, all these packages below are just necessary to build
>>>>> Moses, do I need specific packages
>>>>> to run one or other binary ?
>>>>> confused....
>>>>>
>>>>>
>>>>> Ubuntu
>>>>>
>>>>> Install the following packages using the command
>>>>>
>>>>> sudo apt-get install [package name]
>>>>>
>>>>> Packages:
>>>>>
>>>>> g++
>>>>> git
>>>>> subversion
>>>>> automake
>>>>> libtool
>>>>> zlib1g-dev
>>>>> libboost-all-dev
>>>>> libbz2-dev
>>>>> liblzma-dev
>>>>> python-dev
>>>>> graphviz
>>>>> imagemagick
>>>>> libgoogle-perftools-dev (for tcmalloc)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Le 29/10/2015 15:18, Philipp Koehn a écrit :
>>>>>> Hi,
>>>>>>
>>>>>> make sure that all the paths are valid on all the nodes --- so
>>>>>> definitely no relative paths.
>>>>>> And of course, the binaries need to be executable on all nodes as
>>>>>> well.
>>>>>>
>>>>>> -phi
>>>>>>
>>>>>> On Thu, Oct 29, 2015 at 10:12 AM, Vincent Nguyen <[email protected]
>>>>>> >>> <mailto:[email protected]>> wrote:
>>>>>>
>>>>>>
>>>>>> OK guys, not an easy stuff ...
>>>>>> I fought to get the prerequisites working but but now at least
>>>>>> jobs start .....
>>>>>>
>>>>>> and crash.
>>>>>>
>>>>>> I'll post later the details of the preliminary steps, could
>>>>>> be useful.
>>>>>>
>>>>>> my crash is when lplmz starts.
>>>>>>
>>>>>> I have a sharepoint mounted on my nodes and all bin are well
>>>>>> seen
>>>>>> from the nodes, including the lplmz program.
>>>>>>
>>>>>> but I was thinking, do I need to actually install some
>>>>>> packages on
>>>>>> the nodes themselves ? I mean packages that do not fall under
>>>>>> /mosesdecoder/ folder ?
>>>>>>
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> V
>>>>>>
>>>>>>
>>>>>>
>>>>>> Le 29/10/2015 13:26, Philipp Koehn a écrit :
>>>>>>> Hi,
>>>>>>>
>>>>>>> these machine names are just there for convenience.
>>>>>>>
>>>>>>> If you want experiment.perl to submit jobs per qsub,
>>>>>>> all you have to do is to run experiment.perl with the
>>>>>>> additional switch "-cluster".
>>>>>>>
>>>>>>> You can also put the head node's name into the
>>>>>>> experiment.machines file, then you do not need to
>>>>>>> use the switch anymore.
>>>>>>>
>>>>>>> -phi
>>>>>>>
>>>>>>> On Wed, Oct 28, 2015 at 10:20 AM, Vincent Nguyen
>>>>>>> <[email protected] >>>> <mailto:[email protected]>> wrote:
>>>>>>>
>>>>>>> Hi there,
>>>>>>>
>>>>>>> I need some clarification before screwing up some files.
>>>>>>> I just setup a SGE cluster with a Master + 2 Nodes.
>>>>>>>
>>>>>>> to make it clear let say my cluster name is "default",
>>>>>>> my master
>>>>>>> headnode is "master", my 2 other nodes are "node1" and
>>>>>>> "node2"
>>>>>>>
>>>>>>>
>>>>>>> for EMS :
>>>>>>>
>>>>>>> I opened the default experiment.machines file and I see :
>>>>>>>
>>>>>>> cluster: townhill seville hermes lion seville sannox
>>>>>>> lutzow
>>>>>>> frontend
>>>>>>> multicore-4: freddie
>>>>>>> multicore-8: tyr thor odin crom
>>>>>>> multicore-16: saxnot vali vili freyja bragi hoenir
>>>>>>> multicore-24: syn hel skaol saga buri loki sif magni
>>>>>>> multicore-32: gna snotra lofn thrud
>>>>>>>
>>>>>>> townhill and others are what ? name machines / nodes ?
>>>>>>> name
>>>>>>> of several
>>>>>>> clusters ?
>>>>>>> should I just put "default" or "master node1 node2" ?
>>>>>>>
>>>>>>> multicore-X: should I put machine names here
>>>>>>> if my 3 machines are 8 cores each
>>>>>>> multicore-8: master node1 node2
>>>>>>> right ?
>>>>>>>
>>>>>>>
>>>>>>> then in the config file for EMS:
>>>>>>>
>>>>>>> #generic-parallelizer =
>>>>>>> $moses-script-dir/ems/support/generic-parallelizer.perl
>>>>>>> #generic-parallelizer =
>>>>>>>
>>>>>>> $moses-script-dir/ems/support/generic-multicore-parallelizer.perl
>>>>>>>
>>>>>>> which one should take if my nodes are multicore ?
>>>>>>> still the
>>>>>>> first one ?
>>>>>>>
>>>>>>>
>>>>>>> ### cluster settings (if run on a cluster machine)
>>>>>>> # number of jobs to be submitted in parallel
>>>>>>> #
>>>>>>> #jobs = 10
>>>>>>> should I count approx 1 job per core on the total cores
>>>>>>> of my
>>>>>>> 3 machines ?
>>>>>>>
>>>>>>> # arguments to qsub when scheduling a job
>>>>>>> #qsub-settings = ""
>>>>>>> can this stay empty ?
>>>>>>>
>>>>>>> # project for priviledges and usage accounting
>>>>>>> #qsub-project = iccs_smt
>>>>>>> standard value ?
>>>>>>>
>>>>>>> # memory and time
>>>>>>> #qsub-memory = 4
>>>>>>> #qsub-hours = 48
>>>>>>> 4 what ? GB ?
>>>>>>>
>>>>>>> ### multi-core settings
>>>>>>> # when the generic parallelizer is used, the number of
>>>>>>> cores
>>>>>>> # specified here
>>>>>>> cores = 4
>>>>>>> is this ignored if generic-parallelizer.perl is chosen ?
>>>>>>>
>>>>>>>
>>>>>>> is there a way to put more load on one specific node ?
>>>>>>>
>>>>>>> Many thanks,
>>>>>>> V.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> [email protected] <mailto:[email protected]>
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>
>>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support