On 19/04/13 02:46 PM, Marcelino Suzuki wrote:
> Hi Sebastien
>
> Well, I was able to go though with the assembly with ray2.2.0 and hit
> the wall with 1 node 9 ranks and 2 nodes 16 ranks during
> BiologicalAbundances. I am still having issues with multiple nodes/
> cores. I tried several combinations and the process gets killed after
> ~30 min. I think you call raks the number of processes you send via
> mpirun -np correct? .
Yes, a MPI (message passing interface) rank is usually a process running the
application.
> Pehaps you might habe some guesses to why. The
> message error I got with 3 nodes, and 25 cores (llsubmit # @
> total_tasks = 25) is:
>
> mpirun: killing job...
>
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons on the nodes shown
> below. Additional manual cleanup may be required - please refer to
> the "orte-clean" tool for assistance.
> --------------------------------------------------------------------------
> node017
> node028
> [node028:03832] [[28276,0],2] routed:binomial: Connection to lifeline
> [[28276,0],0] lost
> [node017:03879] [[28276,0],1] routed:binomial: Connection to lifeline
> [[28276,0],0] lost
>
This is sometimes a byproduct of the node a node that is running out of memory.
> Any help very welcome
>
> Marcelino
>
>
> On Apr 17, 2013, at 1:57 PM, Sébastien Boisvert wrote:
>
>> See my responses below.
>>
>> On 16/04/13 06:16 PM, Marcelino Suzuki wrote:
>>> Hurrah it compiled, but could you please take a look at
>>> "HELP!" below
>>>
>>>
>>> Well, after I picked the intel compiler and added the -
>>> DMPICH_IGNORE_CXX_SEEK option in the Makefile of RayPlatform
>>>
>>> $(Q)$(MPICXX) $(CXXFLAGS) -DMPICH_IGNORE_CXX_SEEK -D
>>> RAYPLATFORM_VERSION=\"$(RAYPLATFORM_VERSION)\" -I. -c -o $@ $<
>>>
>>> and the Makefile of ray
>>>
>>> find . -name *Makefile* | xargs -i sed -i "s/mpicxx/mpicxx -
>>> DMPICH_IGNORE_CXX_SEEK/g" {}
>>>
>>> It did compile through.
>>>
>>>
>>>
>>> === HELP!
>>>
>>> I will tried to run jobs with the newly compiled 2.2.0 but
>>> unfortunately I still get the "Error, crambe/ already exists, change
>>> the -o parameter to another value." error
>>>
>>> I get two types of errors:
>>>
>>> Whenever I start one node with 9 cores it crashes after
>>> some 20
>>> minuts during assembly, so I am guessing it is some RAM issue,
>>> although I am "only" running 3.2 M sequences. I do get the "Error,
>>> crambe/ already exists, change the -o parameter to another value. "
>>> but it does not seem to affect it. Actually when I do individual
>>> outfiles per core, it only one of the files gets written to,
>>>
>>>
>>> When I spread among different nodes i.e. 4 nodes with 4
>>> jobs per
>>
>> You mean 4 MPI ranks per node, right ?
>>
>>> node, the job cancels within a minute or so, and individual outfiles
>>> for most of cores have the folowing last lines:
>>>
>>> Enabling CorePlugin 'TaxonomyViewer'
>>> Enabling memory usage reporting.
>>> Error, crambe/ already exists, change the -o parameter to another
>>> value.
>>>
>>
>> What is your file system ? And is it shared between your nodes ?
>>
>>> Rank 0: assembler memory usage: 37992 KiB
>>> Rank 0: assembler memory usage: 103716 KiB
>>> Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 28047
>>>
>>> Nodes (08,11) have no errors and finish by
>>>
>>> Rank 0 is loading sequence reads
>>> Rank 0 : partition is [0;3602181], 3602182 sequence reads
>>> Rank 0 is fetching file /scratch/suzukim/mira/CCB1a.fastq with lazy
>>> loading (please wait...)
>>> Rank 0 has 0 sequence reads
>>> Rank 0: assembler memory usage: 142416 KiB
>>> Rank 0 has 100000 sequence reads
>>> Rank 0: assembler memory usage: 142416 KiB
>>> Rank 0 has 200000 sequence reads
>>> Rank 0: assembler memory usage: 142416 KiB
>>>
>>> Nodes 09,10, have no errors and finish by
>>>
>>> Rank 0 is loading sequence reads
>>> Rank 0 : partition is [0;3602181], 3602182 sequence reads
>>> Rank 0 is fetching file /scratch/suzukim/mira/CCB1a.fastq with lazy
>>> loading (please wait...)
>>>
>>> Marcelino
>>>
>>> On Apr 16, 2013, at 10:14 PM, Sébastien Boisvert wrote:
>>>
>>>> To compile the git version:
>>>>
>>>>
>>>> git clone git://github.com/sebhtml/ray.git
>>>> git clone git://github.com/sebhtml/RayPlatform.git
>>>>
>>>> cd ray
>>>> make
>>>>
>>>> On 16/04/13 04:09 PM, Marcelino Suzuki wrote:
>>>>> Hello and thanks
>>>>>
>>>>> Just that you know, I am getting rhe same type of
>>>>> compilcation
>>>>> errors with other .cpp files as well (after I excluded Amos.cpp)
>>>>> code/
>>>>> plugin_CoverageGatherer/CoverageGatherer.cpp (at least, perhaps
>>>>> others)
>>>>>
>>>>> I I will wait for the new version, and will try in a different
>>>>> cluster where I compiled with gcc.
>>>>>
>>>>> Marcelino
>>>>>
>>>>> On Apr 16, 2013, at 10:00 PM, Sébastien Boisvert wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> v2.2.0 should be released soon. There is one ticket left.
>>>>>>
>>>>>> Otherwise, I don't think this issue is critical as it does not
>>>>>> prevent
>>>>>> assemblies for completing.
>>>>>>
>>>>>> On 16/04/13 03:35 PM, Marcelino Suzuki wrote:
>>>>>>> Hello again
>>>>>>>
>>>>>>> Well, I am not really know how to add the changes using git
>>>>>>> (other
>>>>>>> than editing the .cpp files from V2.1.0, which are not quite the
>>>>>>> same
>>>>>>> as the ones in the commit. I tried to clone the latest ray and
>>>>>>> RayPlatform, and after some tweaeking to use the intel compiler
>>>>>>> (suggested by my sys admin) and the compilation crashed at
>>>>>>> plugin_Amos.cpp. Maybe you can give me some pointers on how to
>>>>>>> fix
>>>>>>> 2.1.0 (which I was able to compile)
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> commands
>>>>>>> cd
>>>>>>> git clone https://bitbucket.org/sebhtml/ray.git
>>>>>>> mv ray /work/OOBMECO/bin
>>>>>>> git clone
>>>>>>> https://bitbucket.org/sebhtml/rayplatform.git
>>>>>>> mv rayplatform/ /work/OOBMECO/bin
>>>>>>> cd /work/OOBMECO/bin/ray/
>>>>>>> source /opt/cluster/gcc-4.4.3/refresh.sh
>>>>>>> source /opt/cluster/compilers/intel/Compiler/
>>>>>>> 11.1/072/
>>>>>>> bin/iccvars.sh intel64
>>>>>>> source /opt/cluster/compilers/intel/impi/
>>>>>>> 4.0.0.028/
>>>>>>> bin64/mpivars.sh
>>>>>>> find . -name *Makefile* | xargs -i sed -i "s/
>>>>>>> mpicxx/
>>>>>>> mpicxx -DMPICH_IGNORE_CXX_SEEK/g" {}
>>>>>>>
>>>>>>> edited the RayPlatform Makefile and add -
>>>>>>> DMPICH_IGNORE_CXX_SEEK
>>>>>>>
>>>>>>> make PREFIX=$PWD
>>>>>>>
>>>>>>> ERROR:
>>>>>>>
>>>>>>> CXX code/plugin_Amos/Amos.o
>>>>>>> code/plugin_Amos/Amos.cpp: In member function â:
>>>>>>> code/plugin_Amos/Amos.cpp:239: error: expected primary-expression
>>>>>>> before â token
>>>>>>> code/plugin_Amos/Amos.cpp:239: error: â was not declared in this
>>>>>>> scope
>>>>>>> code/plugin_Amos/Amos.cpp:240: error: expected primary-expression
>>>>>>> before â token
>>>>>>> code/plugin_Amos/Amos.cpp:240: error: â was not declared in this
>>>>>>> scope
>>>>>>> make: *** [code/plugin_Amos/Amos.o] Error 1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =
>>>>>>> =================================================================
>>>>>>> oOOOOo Marcelino Suzuki, Assoc.
>>>>>>> Professor,
>>>>>>> Intl. Chair, Platform Responsible
>>>>>>> oOOO Univ Pierre Marie Curie (Paris 6) -
>>>>>>> Observatoire Océanologique de Banyuls
>>>>>>> oOOOOOo. UMR 7621 - Laboratoire
>>>>>>> d'Océanographie Microbienne (LOMIC)
>>>>>>> .oOOOOOOOo. Marine Biodiversity and
>>>>>>> Biotechnology (bio2mar) Platform
>>>>>>> .oOOOOOOOOOoo. [email protected] http://bit.ly/
>>>>>>> fq3nbE
>>>>>>> bio2mar.obs-banyuls.fr
>>>>>>> .oOOOOOOOOOOOooooo. Ave du Fontaulé, Banyuls-sur-Mer 66650,
>>>>>>> France
>>>>>>> +33(0)430192401
>>>>>>> 0000000000000000000000000000000000000000000000000000000000000000000000000000
>>>>>>>
>>>>>>> On Apr 16, 2013, at 3:40 PM, Sébastien Boisvert wrote:
>>>>>>>
>>>>>>>> On 15/04/13 04:40 PM, Marcelino Suzuki wrote:
>>>>>>>>> Hello and thanks for the answer.
>>>>>>>>>
>>>>>>>>> I did undestand that and the directory does not exist when I
>>>>>>>>> lauch
>>>>>>>>> the job. I do not get this error from the fist 4 cores, but
>>>>>>>>> after
>>>>>>>>> core 5, I get the error in the output from all cores. The
>>>>>>>>> weirdest
>>>>>>>>> is
>>>>>>>>> that I had these messages in the outfile of a run that
>>>>>>>>> completed,
>>>>>>>>> and
>>>>>>>>> that is why I was wondering if that is normal.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>> This is a race condition affecting version 2.1.0.
>>>>>>>>
>>>>>>>> It was fixed by this commit:
>>>>>>>>
>>>>>>>> commit 73ada528e1d17105e17df524fdb60bdd03b40560
>>>>>>>> Author: Sébastien Boisvert <[email protected]>
>>>>>>>> Date: Fri Feb 8 23:55:34 2013 -0500
>>>>>>>>
>>>>>>>> fix a race condition during directory probing
>>>>>>>> Resolves-bug: https://github.com/sebhtml/ray/issues/125
>>>>>>>> Signed-off-by: Sébastien Boisvert <sebastien.boisvert.
>>>>>>>> [email protected]>
>>>>>>>>
>>>>>>>>
>>>>>>>> https://bitbucket.org/sebhtml/ray/commits/73ada528e1d17105e17df524fdb60bdd03b40560
>>>>>>>>
>>>>>>>>
>>>>>>>> The upcoming v2.2.0 release will include this bug fix.
>>>>>>>>
>>>>>>>>
>>>>>>>> This bug was introduced by the mini-ranks code.
>>>>>>>>
>>>>>>>>
>>>>>>>> For example, if you are using a single machine with 32 cores,
>>>>>>>> you can use
>>>>>>>>
>>>>>>>> mpiexec -n 32 Ray ...
>>>>>>>>
>>>>>>>> or you can use
>>>>>>>>
>>>>>>>> mpiexec -n 4 Ray -mini-ranks-per-rank 7 ...
>>>>>>>>
>>>>>>>>
>>>>>>>> Mini-ranks are experimental, but they seems to work nicely on
>>>>>>>> SMP
>>>>>>>> NUMA machines.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Marcelino
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Apr 15, 2013, at 10:14 PM, Sébastien Boisvert wrote:
>>>>>>>>>
>>>>>>>>>> On 14/04/13 04:45 PM, Marcelino Suzuki wrote:
>>>>>>>>>>> Hello Sebastien
>>>>>>>>>>>
>>>>>>>>>>> I am having issues with assembling a 3.2 M read ion
>>>>>>>>>>> torrent
>>>>>>>>>>> dataset
>>>>>>>>>>> using Loadleveler for an cluster intel iDataplex, with the
>>>>>>>>>>> mpi
>>>>>>>>>>> job
>>>>>>>>>>> crashing during execution. It might be an issue with the
>>>>>>>>>>> cluster
>>>>>>>>>>> (I
>>>>>>>>>>> got this message from the system administrator: En ce moment
>>>>>>>>>>> on a
>>>>>>>>>>> des
>>>>>>>>>>> soucis de réseau IB et de baie de disque), but since I've
>>>>>>>>>>> seen a
>>>>>>>>>>> Error
>>>>>>>>>>> in the output files coming from some cores using the -output-
>>>>>>>>>>> filename
>>>>>>>>>>> option, and just want to make sure I am not missing some
>>>>>>>>>>> detail:
>>>>>>>>>>>
>>>>>>>>>>> Error, crambe/ already exists, change the -o
>>>>>>>>>>> parameter to
>>>>>>>>>>> another
>>>>>>>>>>> value
>>>>>>>>>>>
>>>>>>>>>>> crambe is my -o directory
>>>>>>>>>>>
>>>>>>>>>>> Is that normal behavior? I noticed I got the same
>>>>>>>>>>> errors in a
>>>>>>>>>>> outfile of the same assembly that did go through to the end.
>>>>>>>>>>
>>>>>>>>>> Ray will not overwrite an existing directory.
>>>>>>>>>>
>>>>>>>>>> You must change use another directory name.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> Marcelino
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =
>>>>>>>>>>> =============================================================
>>>>>>>>>>> oOOOOo Marcelino Suzuki, Assoc.
>>>>>>>>>>> Professor,
>>>>>>>>>>> Intl. Chair, Platform Responsible
>>>>>>>>>>> oOOO Univ Pierre Marie Curie (Paris
>>>>>>>>>>> 6) -
>>>>>>>>>>> Observatoire Océanologique de Banyuls
>>>>>>>>>>> oOOOOOo. UMR 7621 - Laboratoire
>>>>>>>>>>> d'Océanographie Microbienne (LOMIC)
>>>>>>>>>>> .oOOOOOOOo. Marine Biodiversity
>>>>>>>>>>> and
>>>>>>>>>>> Biotechnology (bio2mar) Platform
>>>>>>>>>>> .oOOOOOOOOOoo. [email protected] http://
>>>>>>>>>>> bit.ly/
>>>>>>>>>>> fq3nbE
>>>>>>>>>>> bio2mar.obs-banyuls.fr
>>>>>>>>>>> .oOOOOOOOOOOOooooo. Ave du Fontaulé, Banyuls-sur-Mer 66650,
>>>>>>>>>>> France
>>>>>>>>>>> +33(0)430192401
>>>>>>>>>>> 0000000000000000000000000000000000000000000000000000000000000000000000000000
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>>> Precog is a next-generation analytics platform capable of
>>>>>>>>>>> advanced
>>>>>>>>>>> analytics on semi-structured data. The platform includes APIs
>>>>>>>>>>> for
>>>>>>>>>>> building
>>>>>>>>>>> apps and a phenomenal toolset for data science. Developers
>>>>>>>>>>> can
>>>>>>>>>>> use
>>>>>>>>>>> our toolset for easy data analysis & visualization. Get a
>>>>>>>>>>> free
>>>>>>>>>>> account!
>>>>>>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Denovoassembler-users mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-
>>>>>>>>>>> users
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>> Precog is a next-generation analytics platform capable of
>>>>>>>>>> advanced
>>>>>>>>>> analytics on semi-structured data. The platform includes APIs
>>>>>>>>>> for
>>>>>>>>>> building
>>>>>>>>>> apps and a phenomenal toolset for data science. Developers can
>>>>>>>>>> use
>>>>>>>>>> our toolset for easy data analysis & visualization. Get a free
>>>>>>>>>> account!
>>>>>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Denovoassembler-users mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Denovoassembler-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users