Hi,

        The way I do temporary files is mkstemp, unlink, and then use them.
That way the kernel will still clean up if the process meets an untimely
death.

        Given that this issue appears on a SAMBA filesystem (aka SMB) but not
on a POSIX filesystem, I'm guessing it has to do with SAMBA
infelicities.  Like this old bug:
https://bugzilla.samba.org/show_bug.cgi?id=998 .

        I'd like to make it work, but temporary files on SAMBA is pretty low
priority.  However, if you can provide a backtrace (after compiling with
"debug" added to the command) I can try to turn that segfault into an
error message.

Kenneth

On 10/29/2015 08:15 PM, Vincent Nguyen wrote:
> 
> it's the same machine in my last test ...
> 
> let me explain :
> 
> Master = Ubuntu 14.04 which is my original machine with Moses and all my
> other langue tools in /home/moses
> 
> I shared with a smb sharepoint /home/moses as "mosesshare"
> 
> then on 2 new nodes, I mounted /netshr on smb://master/mosesshare/
> 
> did the same on master
> 
> so cd/netshr shows the content of /home/moses absolutely perfectly on
> the 3 machines (master and 2 nodes)
> 
> I think you should ne able to replicate without having to handle sge or
> nodes. Just on 1 machine.
> 
> 
> Le 29/10/2015 20:59, Kenneth Heafield a écrit :
>> Yes.
>>
>> Also this is all very odd.  What file system is /netshr ?
>>
>> On 10/29/2015 07:56 PM, Vincent Nguyen wrote:
>>> Hi,
>>> Do you think in the meantime can I just use -T with a local temporary
>>> directory ?
>>>
>>> -------- Message transféré --------
>>> Sujet :     Re: [Moses-support] Moses on SGE clarification
>>> Date :     Thu, 29 Oct 2015 17:45:01 +0100
>>> De :     Vincent Nguyen <[email protected]>
>>> Pour :     [email protected]
>>>
>>>
>>>
>>> Ken,
>>>
>>> I just did some further testing on the master node that HAS all
>>> installed.
>>> same error as is.
>>>
>>> /netshr/mosesdecoder/bin/lmplz --text
>>> /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa
>>> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T
>>> /netshr/working-en-fr/lm -S 20%
>>>
>>> /netshr is a mounting point of /home/moses/
>>>
>>> so what I did is that I replaced /netshr/ by /home/moses/
>>> first 2 instances => same error
>>>
>>> if I replace in the -T option /netshr by /home/moses
>>> it works.
>>>
>>> so obviously there is an issue here
>>>
>>>
>>>
>>> Le 29/10/2015 17:31, Kenneth Heafield a écrit :
>>>> So we're clear, it runs correctly on the local machine but not when you
>>>> run it through SGE?  In that case, I suspect it's library version
>>>> differences.
>>>>
>>>> On 10/29/2015 03:09 PM, Vincent Nguyen wrote:
>>>>> I get this error :
>>>>>
>>>>> moses@sgenode1:/netshr/working-en-fr$ /netshr/mosesdecoder/bin/lmplz
>>>>> --text /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa
>>>>> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T
>>>>> /netshr/working-en-fr/lm -S 20%
>>>>> === 1/5 Counting and sorting n-grams ===
>>>>> Reading /netshr/working-en-fr/lm/europarl.truecased.7
>>>>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>>>>>
>>>>> tcmalloc: large alloc 2755821568 bytes == 0x25d28000 @
>>>>> ****************************************************************************************************
>>>>>
>>>>> Segmentation fault (core dumped)
>>>>> moses@sgenode1:/netshr/working-en-fr$
>>>>>
>>>>> I installed the libgoogle-pertools-dev but same error.
>>>>> Just to be clear, all these packages below are just necessary to build
>>>>> Moses, do I need specific packages
>>>>> to run one or other binary ?
>>>>> confused....
>>>>>
>>>>>
>>>>>         Ubuntu
>>>>>
>>>>> Install the following packages using the command
>>>>>
>>>>>      sudo apt-get install [package name]
>>>>>
>>>>> Packages:
>>>>>
>>>>>      g++
>>>>>      git
>>>>>      subversion
>>>>>      automake
>>>>>      libtool
>>>>>      zlib1g-dev
>>>>>      libboost-all-dev
>>>>>      libbz2-dev
>>>>>      liblzma-dev
>>>>>      python-dev
>>>>>      graphviz
>>>>>      imagemagick
>>>>>      libgoogle-perftools-dev (for tcmalloc)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Le 29/10/2015 15:18, Philipp Koehn a écrit :
>>>>>> Hi,
>>>>>>
>>>>>> make sure that all the paths are valid on all the nodes --- so
>>>>>> definitely no relative paths.
>>>>>> And of course, the binaries need to be executable on all nodes as
>>>>>> well.
>>>>>>
>>>>>> -phi
>>>>>>
>>>>>> On Thu, Oct 29, 2015 at 10:12 AM, Vincent Nguyen <[email protected]
>>>>>> >>> <mailto:[email protected]>> wrote:
>>>>>>
>>>>>>
>>>>>>       OK guys, not an easy stuff ...
>>>>>>       I fought to get the prerequisites working but but now at least
>>>>>>       jobs start .....
>>>>>>
>>>>>>       and crash.
>>>>>>
>>>>>>       I'll post later the details of the preliminary steps, could
>>>>>> be useful.
>>>>>>
>>>>>>       my crash is when lplmz starts.
>>>>>>
>>>>>>       I have a sharepoint mounted on my nodes and all bin are well
>>>>>> seen
>>>>>>       from the nodes, including the lplmz program.
>>>>>>
>>>>>>       but I was thinking, do I need to actually install some
>>>>>> packages on
>>>>>>       the nodes themselves ? I mean packages that do not fall under
>>>>>>       /mosesdecoder/ folder ?
>>>>>>
>>>>>>
>>>>>>       thanks,
>>>>>>
>>>>>>       V
>>>>>>
>>>>>>
>>>>>>
>>>>>>       Le 29/10/2015 13:26, Philipp Koehn a écrit :
>>>>>>>       Hi,
>>>>>>>
>>>>>>>       these machine names are just there for convenience.
>>>>>>>
>>>>>>>       If you want experiment.perl to submit jobs per qsub,
>>>>>>>       all you have to do is to run experiment.perl with the
>>>>>>>       additional switch "-cluster".
>>>>>>>
>>>>>>>       You can also put the head node's name into the
>>>>>>>       experiment.machines file, then you do not need to
>>>>>>>       use the switch anymore.
>>>>>>>
>>>>>>>       -phi
>>>>>>>
>>>>>>>       On Wed, Oct 28, 2015 at 10:20 AM, Vincent Nguyen
>>>>>>> <[email protected] >>>>      <mailto:[email protected]>> wrote:
>>>>>>>
>>>>>>>           Hi there,
>>>>>>>
>>>>>>>           I need some clarification before screwing up  some files.
>>>>>>>           I just setup a SGE cluster with a Master + 2 Nodes.
>>>>>>>
>>>>>>>           to make it clear let say my cluster name is "default",
>>>>>>> my master
>>>>>>>           headnode is "master", my 2 other nodes are "node1" and
>>>>>>> "node2"
>>>>>>>
>>>>>>>
>>>>>>>           for EMS :
>>>>>>>
>>>>>>>           I opened the default experiment.machines file and I see :
>>>>>>>
>>>>>>>           cluster: townhill seville hermes lion seville sannox
>>>>>>> lutzow
>>>>>>>           frontend
>>>>>>>           multicore-4: freddie
>>>>>>>           multicore-8: tyr thor odin crom
>>>>>>>           multicore-16: saxnot vali vili freyja bragi hoenir
>>>>>>>           multicore-24: syn hel skaol saga buri loki sif magni
>>>>>>>           multicore-32: gna snotra lofn thrud
>>>>>>>
>>>>>>>           townhill and others are what ? name machines / nodes ?
>>>>>>> name
>>>>>>>           of several
>>>>>>>           clusters ?
>>>>>>>           should I just put "default" or "master node1 node2" ?
>>>>>>>
>>>>>>>           multicore-X: should I put machine names here
>>>>>>>           if my 3 machines are 8 cores each
>>>>>>>           multicore-8: master node1 node2
>>>>>>>           right ?
>>>>>>>
>>>>>>>
>>>>>>>           then in the config file for EMS:
>>>>>>>
>>>>>>>           #generic-parallelizer =
>>>>>>>           $moses-script-dir/ems/support/generic-parallelizer.perl
>>>>>>>           #generic-parallelizer =
>>>>>>>          
>>>>>>> $moses-script-dir/ems/support/generic-multicore-parallelizer.perl
>>>>>>>
>>>>>>>           which one should  take if my nodes are multicore ?
>>>>>>> still the
>>>>>>>           first one ?
>>>>>>>
>>>>>>>
>>>>>>>           ### cluster settings (if run on a cluster machine)
>>>>>>>           # number of jobs to be submitted in parallel
>>>>>>>           #
>>>>>>>           #jobs = 10
>>>>>>>           should I count approx 1 job per core on the total cores
>>>>>>> of my
>>>>>>>           3 machines ?
>>>>>>>
>>>>>>>           # arguments to qsub when scheduling a job
>>>>>>>           #qsub-settings = ""
>>>>>>>           can this stay empty ?
>>>>>>>
>>>>>>>           # project for priviledges and usage accounting
>>>>>>>           #qsub-project = iccs_smt
>>>>>>>           standard value ?
>>>>>>>
>>>>>>>           # memory and time
>>>>>>>           #qsub-memory = 4
>>>>>>>           #qsub-hours = 48
>>>>>>>           4 what ? GB ?
>>>>>>>
>>>>>>>           ### multi-core settings
>>>>>>>           # when the generic parallelizer is used, the number of
>>>>>>> cores
>>>>>>>           # specified here
>>>>>>>           cores = 4
>>>>>>>           is this ignored if generic-parallelizer.perl is chosen ?
>>>>>>>
>>>>>>>
>>>>>>>           is there a way to put more load on one specific node ?
>>>>>>>
>>>>>>>           Many thanks,
>>>>>>>           V.
>>>>>>>
>>>>>>>
>>>>>>>           _______________________________________________
>>>>>>>           Moses-support mailing list
>>>>>>>           [email protected] <mailto:[email protected]>
>>>>>>>           http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>
>>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to