On 19/04/13 02:46 PM, Marcelino Suzuki wrote:
>          Hi Sebastien
>
>          Well, I was able to go though with the assembly with ray2.2.0 and hit
> the wall with 1 node 9 ranks and 2 nodes 16 ranks during
> BiologicalAbundances.   I am still having issues with multiple nodes/
> cores.  I tried several combinations and the process gets killed after
> ~30 min. I think you call raks the number of processes you send via
> mpirun -np correct? .

Yes, a MPI (message passing interface) rank is usually a process running the 
application.

> Pehaps you might habe some guesses to why.  The
> message error I got with 3 nodes, and 25 cores (llsubmit # @
> total_tasks = 25) is:
>
> mpirun: killing job...
>
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons on the nodes shown
> below. Additional manual cleanup may be required - please refer to
> the "orte-clean" tool for assistance.
> --------------------------------------------------------------------------
>           node017
>           node028
> [node028:03832] [[28276,0],2] routed:binomial: Connection to lifeline
> [[28276,0],0] lost
> [node017:03879] [[28276,0],1] routed:binomial: Connection to lifeline
> [[28276,0],0] lost
>

This is sometimes a byproduct of the node a node that is running out of memory.

>          Any help very welcome
>
>          Marcelino
>
>
> On Apr 17, 2013, at 1:57 PM, Sébastien Boisvert wrote:
>
>> See my responses below.
>>
>> On 16/04/13 06:16 PM, Marcelino Suzuki wrote:
>>>          Hurrah it compiled, but could you please take a look at
>>> "HELP!" below
>>>
>>>
>>>          Well, after I picked the intel compiler and added the -
>>> DMPICH_IGNORE_CXX_SEEK option in the Makefile of RayPlatform
>>>
>>>           $(Q)$(MPICXX) $(CXXFLAGS) -DMPICH_IGNORE_CXX_SEEK -D
>>> RAYPLATFORM_VERSION=\"$(RAYPLATFORM_VERSION)\" -I. -c -o $@ $<
>>>
>>>          and the Makefile of ray
>>>
>>>          find . -name *Makefile* | xargs -i sed -i "s/mpicxx/mpicxx -
>>> DMPICH_IGNORE_CXX_SEEK/g" {}
>>>
>>>          It did compile through.
>>>
>>>
>>>
>>> ===     HELP!
>>>
>>>          I will tried to run jobs with the newly compiled 2.2.0 but
>>> unfortunately I still get the "Error, crambe/ already exists, change
>>> the -o parameter to another value." error
>>>
>>>          I get two types of errors:
>>>
>>>          Whenever I start one node with 9 cores it crashes after
>>> some 20
>>> minuts during assembly, so I am guessing it is some RAM issue,
>>> although I am "only" running 3.2 M sequences.  I do get the "Error,
>>> crambe/ already exists, change the -o parameter to another value. "
>>> but it does not seem to affect it.  Actually when I do individual
>>> outfiles per core, it only one of the files gets written to,
>>>
>>>
>>>          When I spread among different nodes i.e. 4 nodes with 4
>>> jobs per
>>
>> You mean 4 MPI ranks per node, right ?
>>
>>> node, the job cancels within a minute or so, and individual outfiles
>>> for most of cores  have the folowing last lines:
>>>
>>> Enabling CorePlugin 'TaxonomyViewer'
>>> Enabling memory usage reporting.
>>> Error, crambe/ already exists, change the -o parameter to another
>>> value.
>>>
>>
>> What is your file system ? And is it shared between your nodes ?
>>
>>> Rank 0: assembler memory usage: 37992 KiB
>>> Rank 0: assembler memory usage: 103716 KiB
>>> Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 28047
>>>
>>>          Nodes (08,11) have no errors and finish by
>>>
>>> Rank 0 is loading sequence reads
>>> Rank 0 : partition is [0;3602181], 3602182 sequence reads
>>> Rank 0 is fetching file /scratch/suzukim/mira/CCB1a.fastq with lazy
>>> loading (please wait...)
>>> Rank 0 has 0 sequence reads
>>> Rank 0: assembler memory usage: 142416 KiB
>>> Rank 0 has 100000 sequence reads
>>> Rank 0: assembler memory usage: 142416 KiB
>>> Rank 0 has 200000 sequence reads
>>> Rank 0: assembler memory usage: 142416 KiB
>>>
>>>          Nodes 09,10, have no errors and finish by
>>>
>>> Rank 0 is loading sequence reads
>>> Rank 0 : partition is [0;3602181], 3602182 sequence reads
>>> Rank 0 is fetching file /scratch/suzukim/mira/CCB1a.fastq with lazy
>>> loading (please wait...)
>>>
>>>          Marcelino
>>>
>>> On Apr 16, 2013, at 10:14 PM, Sébastien Boisvert wrote:
>>>
>>>> To compile the git version:
>>>>
>>>>
>>>> git clone git://github.com/sebhtml/ray.git
>>>> git clone git://github.com/sebhtml/RayPlatform.git
>>>>
>>>> cd ray
>>>> make
>>>>
>>>> On 16/04/13 04:09 PM, Marcelino Suzuki wrote:
>>>>>       Hello and thanks
>>>>>
>>>>>       Just that you know,  I am getting rhe same type of
>>>>> compilcation
>>>>> errors with other .cpp files as well  (after I excluded Amos.cpp)
>>>>> code/
>>>>> plugin_CoverageGatherer/CoverageGatherer.cpp (at least, perhaps
>>>>> others)
>>>>>
>>>>>       I I will wait for the new version, and will try in a different
>>>>> cluster where I compiled with gcc.
>>>>>
>>>>>       Marcelino
>>>>>
>>>>> On Apr 16, 2013, at 10:00 PM, Sébastien Boisvert wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> v2.2.0 should be released soon. There is one ticket left.
>>>>>>
>>>>>> Otherwise, I don't think this issue is critical as it does not
>>>>>> prevent
>>>>>> assemblies for completing.
>>>>>>
>>>>>> On 16/04/13 03:35 PM, Marcelino Suzuki wrote:
>>>>>>>     Hello again
>>>>>>>
>>>>>>>     Well, I am not really know how to add the changes using git
>>>>>>> (other
>>>>>>> than editing the .cpp files from V2.1.0, which are not quite the
>>>>>>> same
>>>>>>> as the ones in the commit.  I tried to clone the latest ray and
>>>>>>> RayPlatform, and after some tweaeking to use the intel compiler
>>>>>>> (suggested by my sys admin) and the compilation crashed at
>>>>>>> plugin_Amos.cpp.  Maybe you can give me some pointers on how to
>>>>>>> fix
>>>>>>> 2.1.0 (which I was able to compile)
>>>>>>>
>>>>>>>     Thanks
>>>>>>>
>>>>>>> commands
>>>>>>>                   cd
>>>>>>>                   git clone https://bitbucket.org/sebhtml/ray.git
>>>>>>>                   mv ray /work/OOBMECO/bin
>>>>>>>                   git clone 
>>>>>>> https://bitbucket.org/sebhtml/rayplatform.git
>>>>>>>                   mv rayplatform/ /work/OOBMECO/bin
>>>>>>>                   cd /work/OOBMECO/bin/ray/
>>>>>>>                   source /opt/cluster/gcc-4.4.3/refresh.sh
>>>>>>>                   source /opt/cluster/compilers/intel/Compiler/
>>>>>>> 11.1/072/
>>>>>>> bin/iccvars.sh intel64
>>>>>>>                   source /opt/cluster/compilers/intel/impi/
>>>>>>> 4.0.0.028/
>>>>>>> bin64/mpivars.sh
>>>>>>>                   find . -name *Makefile* | xargs -i sed -i "s/
>>>>>>> mpicxx/
>>>>>>> mpicxx -DMPICH_IGNORE_CXX_SEEK/g" {}
>>>>>>>
>>>>>>>             edited the RayPlatform Makefile and add -
>>>>>>> DMPICH_IGNORE_CXX_SEEK
>>>>>>>
>>>>>>>             make PREFIX=$PWD
>>>>>>>
>>>>>>> ERROR:
>>>>>>>
>>>>>>> CXX code/plugin_Amos/Amos.o
>>>>>>> code/plugin_Amos/Amos.cpp: In member function â:
>>>>>>> code/plugin_Amos/Amos.cpp:239: error: expected primary-expression
>>>>>>> before â token
>>>>>>> code/plugin_Amos/Amos.cpp:239: error: â was not declared in this
>>>>>>> scope
>>>>>>> code/plugin_Amos/Amos.cpp:240: error: expected primary-expression
>>>>>>> before â token
>>>>>>> code/plugin_Amos/Amos.cpp:240: error: â was not declared in this
>>>>>>> scope
>>>>>>> make: *** [code/plugin_Amos/Amos.o] Error 1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =================================================================
>>>>>>>               oOOOOo             Marcelino Suzuki,  Assoc.
>>>>>>> Professor,
>>>>>>> Intl.  Chair, Platform Responsible
>>>>>>>             oOOO              Univ Pierre Marie Curie (Paris 6) -
>>>>>>> Observatoire Océanologique de Banyuls
>>>>>>>          oOOOOOo.                       UMR 7621 - Laboratoire
>>>>>>> d'Océanographie Microbienne (LOMIC)
>>>>>>>       .oOOOOOOOo.                        Marine Biodiversity and
>>>>>>> Biotechnology (bio2mar) Platform
>>>>>>>     .oOOOOOOOOOoo.      suz...@obs-banyuls.fr    http://bit.ly/
>>>>>>> fq3nbE
>>>>>>> bio2mar.obs-banyuls.fr
>>>>>>> .oOOOOOOOOOOOooooo.   Ave du Fontaulé, Banyuls-sur-Mer 66650,
>>>>>>> France
>>>>>>> +33(0)430192401
>>>>>>> 0000000000000000000000000000000000000000000000000000000000000000000000000000
>>>>>>>
>>>>>>> On Apr 16, 2013, at 3:40 PM, Sébastien Boisvert wrote:
>>>>>>>
>>>>>>>> On 15/04/13 04:40 PM, Marcelino Suzuki wrote:
>>>>>>>>>   Hello and thanks for the answer.
>>>>>>>>>
>>>>>>>>>   I did undestand that and the directory does not exist when I
>>>>>>>>> lauch
>>>>>>>>> the job.  I do not get this error from the fist 4 cores, but
>>>>>>>>> after
>>>>>>>>> core 5, I get the error in the output from all cores.  The
>>>>>>>>> weirdest
>>>>>>>>> is
>>>>>>>>> that I had these messages in the outfile of a run that
>>>>>>>>> completed,
>>>>>>>>> and
>>>>>>>>> that is why I was wondering if that is normal.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>> This is a race condition affecting version 2.1.0.
>>>>>>>>
>>>>>>>> It was fixed by this commit:
>>>>>>>>
>>>>>>>> commit 73ada528e1d17105e17df524fdb60bdd03b40560
>>>>>>>> Author: Sébastien Boisvert <sebastien.boisver...@ulaval.ca>
>>>>>>>> Date:   Fri Feb 8 23:55:34 2013 -0500
>>>>>>>>
>>>>>>>>     fix a race condition during directory probing
>>>>>>>>         Resolves-bug: https://github.com/sebhtml/ray/issues/125
>>>>>>>>     Signed-off-by: Sébastien Boisvert <sebastien.boisvert.
>>>>>>>> 3...@ulaval.ca>
>>>>>>>>
>>>>>>>>
>>>>>>>> https://bitbucket.org/sebhtml/ray/commits/73ada528e1d17105e17df524fdb60bdd03b40560
>>>>>>>>
>>>>>>>>
>>>>>>>> The upcoming v2.2.0 release will include this bug fix.
>>>>>>>>
>>>>>>>>
>>>>>>>> This bug was introduced by the mini-ranks code.
>>>>>>>>
>>>>>>>>
>>>>>>>> For example, if you are using a single machine with 32 cores,
>>>>>>>> you can use
>>>>>>>>
>>>>>>>>     mpiexec -n 32 Ray ...
>>>>>>>>
>>>>>>>> or you can use
>>>>>>>>
>>>>>>>>     mpiexec -n 4 Ray -mini-ranks-per-rank 7 ...
>>>>>>>>
>>>>>>>>
>>>>>>>> Mini-ranks are experimental, but they seems to work nicely on
>>>>>>>> SMP
>>>>>>>> NUMA machines.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>   Marcelino
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Apr 15, 2013, at 10:14 PM, Sébastien Boisvert wrote:
>>>>>>>>>
>>>>>>>>>> On 14/04/13 04:45 PM, Marcelino Suzuki wrote:
>>>>>>>>>>>         Hello Sebastien
>>>>>>>>>>>
>>>>>>>>>>>         I am having issues with assembling a 3.2 M read ion
>>>>>>>>>>> torrent
>>>>>>>>>>> dataset
>>>>>>>>>>> using Loadleveler for an cluster intel iDataplex, with the
>>>>>>>>>>> mpi
>>>>>>>>>>> job
>>>>>>>>>>> crashing during execution.  It might be an issue with the
>>>>>>>>>>> cluster
>>>>>>>>>>> (I
>>>>>>>>>>> got this message from the system administrator: En ce moment
>>>>>>>>>>> on a
>>>>>>>>>>> des
>>>>>>>>>>> soucis de réseau IB et de baie de disque), but since I've
>>>>>>>>>>> seen a
>>>>>>>>>>> Error
>>>>>>>>>>> in the output files coming from some cores using the -output-
>>>>>>>>>>> filename
>>>>>>>>>>> option, and just want to make sure I am not missing some
>>>>>>>>>>> detail:
>>>>>>>>>>>
>>>>>>>>>>>         Error, crambe/ already exists, change the -o
>>>>>>>>>>> parameter to
>>>>>>>>>>> another
>>>>>>>>>>> value
>>>>>>>>>>>
>>>>>>>>>>>         crambe is my -o directory
>>>>>>>>>>>
>>>>>>>>>>>         Is that normal behavior?  I noticed I got the same
>>>>>>>>>>> errors in a
>>>>>>>>>>> outfile of the same assembly that did go through to the end.
>>>>>>>>>>
>>>>>>>>>> Ray will not overwrite an existing directory.
>>>>>>>>>>
>>>>>>>>>> You must change use another directory name.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>         Thanks
>>>>>>>>>>>
>>>>>>>>>>>         Marcelino
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =============================================================
>>>>>>>>>>>               oOOOOo             Marcelino Suzuki,  Assoc.
>>>>>>>>>>> Professor,
>>>>>>>>>>> Intl.  Chair, Platform Responsible
>>>>>>>>>>>             oOOO              Univ Pierre Marie Curie (Paris
>>>>>>>>>>> 6) -
>>>>>>>>>>> Observatoire Océanologique de Banyuls
>>>>>>>>>>>          oOOOOOo.                       UMR 7621 - Laboratoire
>>>>>>>>>>> d'Océanographie Microbienne (LOMIC)
>>>>>>>>>>>       .oOOOOOOOo.                        Marine Biodiversity
>>>>>>>>>>> and
>>>>>>>>>>> Biotechnology (bio2mar) Platform
>>>>>>>>>>>     .oOOOOOOOOOoo.      suz...@obs-banyuls.fr    http://
>>>>>>>>>>> bit.ly/
>>>>>>>>>>> fq3nbE
>>>>>>>>>>> bio2mar.obs-banyuls.fr
>>>>>>>>>>> .oOOOOOOOOOOOooooo.   Ave du Fontaulé, Banyuls-sur-Mer 66650,
>>>>>>>>>>> France
>>>>>>>>>>> +33(0)430192401
>>>>>>>>>>> 0000000000000000000000000000000000000000000000000000000000000000000000000000
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>>> Precog is a next-generation analytics platform capable of
>>>>>>>>>>> advanced
>>>>>>>>>>> analytics on semi-structured data. The platform includes APIs
>>>>>>>>>>> for
>>>>>>>>>>> building
>>>>>>>>>>> apps and a phenomenal toolset for data science. Developers
>>>>>>>>>>> can
>>>>>>>>>>> use
>>>>>>>>>>> our toolset for easy data analysis & visualization. Get a
>>>>>>>>>>> free
>>>>>>>>>>> account!
>>>>>>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Denovoassembler-users mailing list
>>>>>>>>>>> Denovoassembler-users@lists.sourceforge.net
>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-
>>>>>>>>>>> users
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>> Precog is a next-generation analytics platform capable of
>>>>>>>>>> advanced
>>>>>>>>>> analytics on semi-structured data. The platform includes APIs
>>>>>>>>>> for
>>>>>>>>>> building
>>>>>>>>>> apps and a phenomenal toolset for data science. Developers can
>>>>>>>>>> use
>>>>>>>>>> our toolset for easy data analysis & visualization. Get a free
>>>>>>>>>> account!
>>>>>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Denovoassembler-users mailing list
>>>>>>>>>> Denovoassembler-users@lists.sourceforge.net
>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to