I'll mount NFS instead and will confirm if working.
thanks

Le 29/10/2015 21:31, Kenneth Heafield a écrit :
> Hi,
>
>       The way I do temporary files is mkstemp, unlink, and then use them.
> That way the kernel will still clean up if the process meets an untimely
> death.
>
>       Given that this issue appears on a SAMBA filesystem (aka SMB) but not
> on a POSIX filesystem, I'm guessing it has to do with SAMBA
> infelicities.  Like this old bug:
> https://bugzilla.samba.org/show_bug.cgi?id=998 .
>
>       I'd like to make it work, but temporary files on SAMBA is pretty low
> priority.  However, if you can provide a backtrace (after compiling with
> "debug" added to the command) I can try to turn that segfault into an
> error message.
>
> Kenneth
>
> On 10/29/2015 08:15 PM, Vincent Nguyen wrote:
>> it's the same machine in my last test ...
>>
>> let me explain :
>>
>> Master = Ubuntu 14.04 which is my original machine with Moses and all my
>> other langue tools in /home/moses
>>
>> I shared with a smb sharepoint /home/moses as "mosesshare"
>>
>> then on 2 new nodes, I mounted /netshr on smb://master/mosesshare/
>>
>> did the same on master
>>
>> so cd/netshr shows the content of /home/moses absolutely perfectly on
>> the 3 machines (master and 2 nodes)
>>
>> I think you should ne able to replicate without having to handle sge or
>> nodes. Just on 1 machine.
>>
>>
>> Le 29/10/2015 20:59, Kenneth Heafield a écrit :
>>> Yes.
>>>
>>> Also this is all very odd.  What file system is /netshr ?
>>>
>>> On 10/29/2015 07:56 PM, Vincent Nguyen wrote:
>>>> Hi,
>>>> Do you think in the meantime can I just use -T with a local temporary
>>>> directory ?
>>>>
>>>> -------- Message transféré --------
>>>> Sujet :     Re: [Moses-support] Moses on SGE clarification
>>>> Date :     Thu, 29 Oct 2015 17:45:01 +0100
>>>> De :     Vincent Nguyen <[email protected]>
>>>> Pour :     [email protected]
>>>>
>>>>
>>>>
>>>> Ken,
>>>>
>>>> I just did some further testing on the master node that HAS all
>>>> installed.
>>>> same error as is.
>>>>
>>>> /netshr/mosesdecoder/bin/lmplz --text
>>>> /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa
>>>> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T
>>>> /netshr/working-en-fr/lm -S 20%
>>>>
>>>> /netshr is a mounting point of /home/moses/
>>>>
>>>> so what I did is that I replaced /netshr/ by /home/moses/
>>>> first 2 instances => same error
>>>>
>>>> if I replace in the -T option /netshr by /home/moses
>>>> it works.
>>>>
>>>> so obviously there is an issue here
>>>>
>>>>
>>>>
>>>> Le 29/10/2015 17:31, Kenneth Heafield a écrit :
>>>>> So we're clear, it runs correctly on the local machine but not when you
>>>>> run it through SGE?  In that case, I suspect it's library version
>>>>> differences.
>>>>>
>>>>> On 10/29/2015 03:09 PM, Vincent Nguyen wrote:
>>>>>> I get this error :
>>>>>>
>>>>>> moses@sgenode1:/netshr/working-en-fr$ /netshr/mosesdecoder/bin/lmplz
>>>>>> --text /netshr/working-en-fr/lm/europarl.truecased.7 --order 5 --arpa
>>>>>> /netshr/working-en-fr/lm/europarl.lm.7 --prune 0 0 1 -T
>>>>>> /netshr/working-en-fr/lm -S 20%
>>>>>> === 1/5 Counting and sorting n-grams ===
>>>>>> Reading /netshr/working-en-fr/lm/europarl.truecased.7
>>>>>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>>>>>>
>>>>>> tcmalloc: large alloc 2755821568 bytes == 0x25d28000 @
>>>>>> ****************************************************************************************************
>>>>>>
>>>>>> Segmentation fault (core dumped)
>>>>>> moses@sgenode1:/netshr/working-en-fr$
>>>>>>
>>>>>> I installed the libgoogle-pertools-dev but same error.
>>>>>> Just to be clear, all these packages below are just necessary to build
>>>>>> Moses, do I need specific packages
>>>>>> to run one or other binary ?
>>>>>> confused....
>>>>>>
>>>>>>
>>>>>>          Ubuntu
>>>>>>
>>>>>> Install the following packages using the command
>>>>>>
>>>>>>       sudo apt-get install [package name]
>>>>>>
>>>>>> Packages:
>>>>>>
>>>>>>       g++
>>>>>>       git
>>>>>>       subversion
>>>>>>       automake
>>>>>>       libtool
>>>>>>       zlib1g-dev
>>>>>>       libboost-all-dev
>>>>>>       libbz2-dev
>>>>>>       liblzma-dev
>>>>>>       python-dev
>>>>>>       graphviz
>>>>>>       imagemagick
>>>>>>       libgoogle-perftools-dev (for tcmalloc)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Le 29/10/2015 15:18, Philipp Koehn a écrit :
>>>>>>> Hi,
>>>>>>>
>>>>>>> make sure that all the paths are valid on all the nodes --- so
>>>>>>> definitely no relative paths.
>>>>>>> And of course, the binaries need to be executable on all nodes as
>>>>>>> well.
>>>>>>>
>>>>>>> -phi
>>>>>>>
>>>>>>> On Thu, Oct 29, 2015 at 10:12 AM, Vincent Nguyen <[email protected]
>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>
>>>>>>>        OK guys, not an easy stuff ...
>>>>>>>        I fought to get the prerequisites working but but now at least
>>>>>>>        jobs start .....
>>>>>>>
>>>>>>>        and crash.
>>>>>>>
>>>>>>>        I'll post later the details of the preliminary steps, could
>>>>>>> be useful.
>>>>>>>
>>>>>>>        my crash is when lplmz starts.
>>>>>>>
>>>>>>>        I have a sharepoint mounted on my nodes and all bin are well
>>>>>>> seen
>>>>>>>        from the nodes, including the lplmz program.
>>>>>>>
>>>>>>>        but I was thinking, do I need to actually install some
>>>>>>> packages on
>>>>>>>        the nodes themselves ? I mean packages that do not fall under
>>>>>>>        /mosesdecoder/ folder ?
>>>>>>>
>>>>>>>
>>>>>>>        thanks,
>>>>>>>
>>>>>>>        V
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>        Le 29/10/2015 13:26, Philipp Koehn a écrit :
>>>>>>>>        Hi,
>>>>>>>>
>>>>>>>>        these machine names are just there for convenience.
>>>>>>>>
>>>>>>>>        If you want experiment.perl to submit jobs per qsub,
>>>>>>>>        all you have to do is to run experiment.perl with the
>>>>>>>>        additional switch "-cluster".
>>>>>>>>
>>>>>>>>        You can also put the head node's name into the
>>>>>>>>        experiment.machines file, then you do not need to
>>>>>>>>        use the switch anymore.
>>>>>>>>
>>>>>>>>        -phi
>>>>>>>>
>>>>>>>>        On Wed, Oct 28, 2015 at 10:20 AM, Vincent Nguyen
>>>>>>>> <[email protected] >>>>      <mailto:[email protected]>> wrote:
>>>>>>>>
>>>>>>>>            Hi there,
>>>>>>>>
>>>>>>>>            I need some clarification before screwing up  some files.
>>>>>>>>            I just setup a SGE cluster with a Master + 2 Nodes.
>>>>>>>>
>>>>>>>>            to make it clear let say my cluster name is "default",
>>>>>>>> my master
>>>>>>>>            headnode is "master", my 2 other nodes are "node1" and
>>>>>>>> "node2"
>>>>>>>>
>>>>>>>>
>>>>>>>>            for EMS :
>>>>>>>>
>>>>>>>>            I opened the default experiment.machines file and I see :
>>>>>>>>
>>>>>>>>            cluster: townhill seville hermes lion seville sannox
>>>>>>>> lutzow
>>>>>>>>            frontend
>>>>>>>>            multicore-4: freddie
>>>>>>>>            multicore-8: tyr thor odin crom
>>>>>>>>            multicore-16: saxnot vali vili freyja bragi hoenir
>>>>>>>>            multicore-24: syn hel skaol saga buri loki sif magni
>>>>>>>>            multicore-32: gna snotra lofn thrud
>>>>>>>>
>>>>>>>>            townhill and others are what ? name machines / nodes ?
>>>>>>>> name
>>>>>>>>            of several
>>>>>>>>            clusters ?
>>>>>>>>            should I just put "default" or "master node1 node2" ?
>>>>>>>>
>>>>>>>>            multicore-X: should I put machine names here
>>>>>>>>            if my 3 machines are 8 cores each
>>>>>>>>            multicore-8: master node1 node2
>>>>>>>>            right ?
>>>>>>>>
>>>>>>>>
>>>>>>>>            then in the config file for EMS:
>>>>>>>>
>>>>>>>>            #generic-parallelizer =
>>>>>>>>            $moses-script-dir/ems/support/generic-parallelizer.perl
>>>>>>>>            #generic-parallelizer =
>>>>>>>>           
>>>>>>>> $moses-script-dir/ems/support/generic-multicore-parallelizer.perl
>>>>>>>>
>>>>>>>>            which one should  take if my nodes are multicore ?
>>>>>>>> still the
>>>>>>>>            first one ?
>>>>>>>>
>>>>>>>>
>>>>>>>>            ### cluster settings (if run on a cluster machine)
>>>>>>>>            # number of jobs to be submitted in parallel
>>>>>>>>            #
>>>>>>>>            #jobs = 10
>>>>>>>>            should I count approx 1 job per core on the total cores
>>>>>>>> of my
>>>>>>>>            3 machines ?
>>>>>>>>
>>>>>>>>            # arguments to qsub when scheduling a job
>>>>>>>>            #qsub-settings = ""
>>>>>>>>            can this stay empty ?
>>>>>>>>
>>>>>>>>            # project for priviledges and usage accounting
>>>>>>>>            #qsub-project = iccs_smt
>>>>>>>>            standard value ?
>>>>>>>>
>>>>>>>>            # memory and time
>>>>>>>>            #qsub-memory = 4
>>>>>>>>            #qsub-hours = 48
>>>>>>>>            4 what ? GB ?
>>>>>>>>
>>>>>>>>            ### multi-core settings
>>>>>>>>            # when the generic parallelizer is used, the number of
>>>>>>>> cores
>>>>>>>>            # specified here
>>>>>>>>            cores = 4
>>>>>>>>            is this ignored if generic-parallelizer.perl is chosen ?
>>>>>>>>
>>>>>>>>
>>>>>>>>            is there a way to put more load on one specific node ?
>>>>>>>>
>>>>>>>>            Many thanks,
>>>>>>>>            V.
>>>>>>>>
>>>>>>>>
>>>>>>>>            _______________________________________________
>>>>>>>>            Moses-support mailing list
>>>>>>>>            [email protected] <mailto:[email protected]>
>>>>>>>>            http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>
>>>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> [email protected]
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to