On 20/04/13 04:18 AM, Marcelino Suzuki wrote:
>          Hi Sebastien
>
>          One last question (I hope for the moment)
>
>          For the runs that work for me (2 nodes 16 cores), whenever I ask for
> the -output-filename, I have quite different outputs per rank (i.e.
> two of the outfiles output the progress while the rest dont change and
> ouput the following lines:
>
> -s (single sequences)
>    Sequences: /scratch/suzukim/mira/CCB1a.fastq
> Enabling CorePlugin 'TaxonomyViewer'
> Enabling memory usage reporting.
> Error, crambe/ already exists, change the -o parameter to another value.
>
> Rank 0: assembler memory usage: 37988 KiB
> Rank 0: assembler memory usage: 103712 KiB
> Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 8169
>
>          Is this behavior "normal"? or does it mean I am only running the
> calculations in two cores?  If that is the case, I could try to assign
> a rank per node, but since I have been having errors whenever I use
> more that two nodes, I rather not.
>

Can you provide your command line for your 2-node job for which you are
experiencing problems ?

Also, which version are you using ? v2.2.0 has a lot of bug fixes under the 
hood.

>          Thanks much!
>
>          Marcelino
>
> =
> =
> =
> =
> ========================================================================
>               oOOOOo             Marcelino Suzuki,  Assoc. Professor,
> Intl.  Chair, Platform Responsible
>             oOOO              Univ Pierre Marie Curie (Paris 6) -
> Observatoire Océanologique de Banyuls
>          oOOOOOo.                       UMR 7621 - Laboratoire
> d'Océanographie Microbienne (LOMIC)
>       .oOOOOOOOo.                        Marine Biodiversity and
> Biotechnology (bio2mar) Platform
>     .oOOOOOOOOOoo.      suz...@obs-banyuls.fr    http://bit.ly/fq3nbE
> bio2mar.obs-banyuls.fr
> .oOOOOOOOOOOOooooo.   Ave du Fontaulé, Banyuls-sur-Mer 66650, France
> +33(0)430192401
> 0000000000000000000000000000000000000000000000000000000000000000000000000000
>
> On Apr 19, 2013, at 10:52 PM, Sébastien Boisvert wrote:
>
>> On 19/04/13 02:46 PM, Marcelino Suzuki wrote:
>>>          Hi Sebastien
>>>
>>>          Well, I was able to go though with the assembly with
>>> ray2.2.0 and hit
>>> the wall with 1 node 9 ranks and 2 nodes 16 ranks during
>>> BiologicalAbundances.   I am still having issues with multiple nodes/
>>> cores.  I tried several combinations and the process gets killed
>>> after
>>> ~30 min. I think you call raks the number of processes you send via
>>> mpirun -np correct? .
>>
>> Yes, a MPI (message passing interface) rank is usually a process
>> running the application.
>>
>>> Pehaps you might habe some guesses to why.  The
>>> message error I got with 3 nodes, and 25 cores (llsubmit # @
>>> total_tasks = 25) is:
>>>
>>> mpirun: killing job...
>>>
>>> --------------------------------------------------------------------------
>>> mpirun was unable to cleanly terminate the daemons on the nodes shown
>>> below. Additional manual cleanup may be required - please refer to
>>> the "orte-clean" tool for assistance.
>>> --------------------------------------------------------------------------
>>>           node017
>>>           node028
>>> [node028:03832] [[28276,0],2] routed:binomial: Connection to lifeline
>>> [[28276,0],0] lost
>>> [node017:03879] [[28276,0],1] routed:binomial: Connection to lifeline
>>> [[28276,0],0] lost
>>>
>>
>> This is sometimes a byproduct of the node a node that is running out
>> of memory.
>>
>>>          Any help very welcome
>>>
>>>          Marcelino
>>>
>>>
>>> On Apr 17, 2013, at 1:57 PM, Sébastien Boisvert wrote:
>>>
>>>> See my responses below.
>>>>
>>>> On 16/04/13 06:16 PM, Marcelino Suzuki wrote:
>>>>>          Hurrah it compiled, but could you please take a look at
>>>>> "HELP!" below
>>>>>
>>>>>
>>>>>          Well, after I picked the intel compiler and added the -
>>>>> DMPICH_IGNORE_CXX_SEEK option in the Makefile of RayPlatform
>>>>>
>>>>>           $(Q)$(MPICXX) $(CXXFLAGS) -DMPICH_IGNORE_CXX_SEEK -D
>>>>> RAYPLATFORM_VERSION=\"$(RAYPLATFORM_VERSION)\" -I. -c -o $@ $<
>>>>>
>>>>>          and the Makefile of ray
>>>>>
>>>>>          find . -name *Makefile* | xargs -i sed -i "s/mpicxx/
>>>>> mpicxx -
>>>>> DMPICH_IGNORE_CXX_SEEK/g" {}
>>>>>
>>>>>          It did compile through.
>>>>>
>>>>>
>>>>>
>>>>> ===     HELP!
>>>>>
>>>>>          I will tried to run jobs with the newly compiled 2.2.0 but
>>>>> unfortunately I still get the "Error, crambe/ already exists,
>>>>> change
>>>>> the -o parameter to another value." error
>>>>>
>>>>>          I get two types of errors:
>>>>>
>>>>>          Whenever I start one node with 9 cores it crashes after
>>>>> some 20
>>>>> minuts during assembly, so I am guessing it is some RAM issue,
>>>>> although I am "only" running 3.2 M sequences.  I do get the "Error,
>>>>> crambe/ already exists, change the -o parameter to another value. "
>>>>> but it does not seem to affect it.  Actually when I do individual
>>>>> outfiles per core, it only one of the files gets written to,
>>>>>
>>>>>
>>>>>          When I spread among different nodes i.e. 4 nodes with 4
>>>>> jobs per
>>>>
>>>> You mean 4 MPI ranks per node, right ?
>>>>
>>>>> node, the job cancels within a minute or so, and individual
>>>>> outfiles
>>>>> for most of cores  have the folowing last lines:
>>>>>
>>>>> Enabling CorePlugin 'TaxonomyViewer'
>>>>> Enabling memory usage reporting.
>>>>> Error, crambe/ already exists, change the -o parameter to another
>>>>> value.
>>>>>
>>>>
>>>> What is your file system ? And is it shared between your nodes ?
>>>>
>>>>> Rank 0: assembler memory usage: 37992 KiB
>>>>> Rank 0: assembler memory usage: 103716 KiB
>>>>> Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 28047
>>>>>
>>>>>          Nodes (08,11) have no errors and finish by
>>>>>
>>>>> Rank 0 is loading sequence reads
>>>>> Rank 0 : partition is [0;3602181], 3602182 sequence reads
>>>>> Rank 0 is fetching file /scratch/suzukim/mira/CCB1a.fastq with lazy
>>>>> loading (please wait...)
>>>>> Rank 0 has 0 sequence reads
>>>>> Rank 0: assembler memory usage: 142416 KiB
>>>>> Rank 0 has 100000 sequence reads
>>>>> Rank 0: assembler memory usage: 142416 KiB
>>>>> Rank 0 has 200000 sequence reads
>>>>> Rank 0: assembler memory usage: 142416 KiB
>>>>>
>>>>>          Nodes 09,10, have no errors and finish by
>>>>>
>>>>> Rank 0 is loading sequence reads
>>>>> Rank 0 : partition is [0;3602181], 3602182 sequence reads
>>>>> Rank 0 is fetching file /scratch/suzukim/mira/CCB1a.fastq with lazy
>>>>> loading (please wait...)
>>>>>
>>>>>          Marcelino
>>>>>
>>>>> On Apr 16, 2013, at 10:14 PM, Sébastien Boisvert wrote:
>>>>>
>>>>>> To compile the git version:
>>>>>>
>>>>>>
>>>>>> git clone git://github.com/sebhtml/ray.git
>>>>>> git clone git://github.com/sebhtml/RayPlatform.git
>>>>>>
>>>>>> cd ray
>>>>>> make
>>>>>>
>>>>>> On 16/04/13 04:09 PM, Marcelino Suzuki wrote:
>>>>>>>       Hello and thanks
>>>>>>>
>>>>>>>       Just that you know,  I am getting rhe same type of
>>>>>>> compilcation
>>>>>>> errors with other .cpp files as well  (after I excluded Amos.cpp)
>>>>>>> code/
>>>>>>> plugin_CoverageGatherer/CoverageGatherer.cpp (at least, perhaps
>>>>>>> others)
>>>>>>>
>>>>>>>       I I will wait for the new version, and will try in a
>>>>>>> different
>>>>>>> cluster where I compiled with gcc.
>>>>>>>
>>>>>>>       Marcelino
>>>>>>>
>>>>>>> On Apr 16, 2013, at 10:00 PM, Sébastien Boisvert wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> v2.2.0 should be released soon. There is one ticket left.
>>>>>>>>
>>>>>>>> Otherwise, I don't think this issue is critical as it does not
>>>>>>>> prevent
>>>>>>>> assemblies for completing.
>>>>>>>>
>>>>>>>> On 16/04/13 03:35 PM, Marcelino Suzuki wrote:
>>>>>>>>>     Hello again
>>>>>>>>>
>>>>>>>>>     Well, I am not really know how to add the changes using git
>>>>>>>>> (other
>>>>>>>>> than editing the .cpp files from V2.1.0, which are not quite
>>>>>>>>> the
>>>>>>>>> same
>>>>>>>>> as the ones in the commit.  I tried to clone the latest ray and
>>>>>>>>> RayPlatform, and after some tweaeking to use the intel compiler
>>>>>>>>> (suggested by my sys admin) and the compilation crashed at
>>>>>>>>> plugin_Amos.cpp.  Maybe you can give me some pointers on how to
>>>>>>>>> fix
>>>>>>>>> 2.1.0 (which I was able to compile)
>>>>>>>>>
>>>>>>>>>     Thanks
>>>>>>>>>
>>>>>>>>> commands
>>>>>>>>>                   cd
>>>>>>>>>                   git clone https://bitbucket.org/sebhtml/
>>>>>>>>> ray.git
>>>>>>>>>                   mv ray /work/OOBMECO/bin
>>>>>>>>>                   git clone 
>>>>>>>>> https://bitbucket.org/sebhtml/rayplatform.git
>>>>>>>>>                   mv rayplatform/ /work/OOBMECO/bin
>>>>>>>>>                   cd /work/OOBMECO/bin/ray/
>>>>>>>>>                   source /opt/cluster/gcc-4.4.3/refresh.sh
>>>>>>>>>                   source /opt/cluster/compilers/intel/Compiler/
>>>>>>>>> 11.1/072/
>>>>>>>>> bin/iccvars.sh intel64
>>>>>>>>>                   source /opt/cluster/compilers/intel/impi/
>>>>>>>>> 4.0.0.028/
>>>>>>>>> bin64/mpivars.sh
>>>>>>>>>                   find . -name *Makefile* | xargs -i sed -i "s/
>>>>>>>>> mpicxx/
>>>>>>>>> mpicxx -DMPICH_IGNORE_CXX_SEEK/g" {}
>>>>>>>>>
>>>>>>>>>             edited the RayPlatform Makefile and add -
>>>>>>>>> DMPICH_IGNORE_CXX_SEEK
>>>>>>>>>
>>>>>>>>>             make PREFIX=$PWD
>>>>>>>>>
>>>>>>>>> ERROR:
>>>>>>>>>
>>>>>>>>> CXX code/plugin_Amos/Amos.o
>>>>>>>>> code/plugin_Amos/Amos.cpp: In member function â:
>>>>>>>>> code/plugin_Amos/Amos.cpp:239: error: expected primary-
>>>>>>>>> expression
>>>>>>>>> before â token
>>>>>>>>> code/plugin_Amos/Amos.cpp:239: error: â was not declared in
>>>>>>>>> this
>>>>>>>>> scope
>>>>>>>>> code/plugin_Amos/Amos.cpp:240: error: expected primary-
>>>>>>>>> expression
>>>>>>>>> before â token
>>>>>>>>> code/plugin_Amos/Amos.cpp:240: error: â was not declared in
>>>>>>>>> this
>>>>>>>>> scope
>>>>>>>>> make: *** [code/plugin_Amos/Amos.o] Error 1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>> =
>>>>>>>>> ===============================================================
>>>>>>>>>               oOOOOo             Marcelino Suzuki,  Assoc.
>>>>>>>>> Professor,
>>>>>>>>> Intl.  Chair, Platform Responsible
>>>>>>>>>             oOOO              Univ Pierre Marie Curie (Paris
>>>>>>>>> 6) -
>>>>>>>>> Observatoire Océanologique de Banyuls
>>>>>>>>>          oOOOOOo.                       UMR 7621 - Laboratoire
>>>>>>>>> d'Océanographie Microbienne (LOMIC)
>>>>>>>>>       .oOOOOOOOo.                        Marine Biodiversity and
>>>>>>>>> Biotechnology (bio2mar) Platform
>>>>>>>>>     .oOOOOOOOOOoo.      suz...@obs-banyuls.fr    http://bit.ly/
>>>>>>>>> fq3nbE
>>>>>>>>> bio2mar.obs-banyuls.fr
>>>>>>>>> .oOOOOOOOOOOOooooo.   Ave du Fontaulé, Banyuls-sur-Mer 66650,
>>>>>>>>> France
>>>>>>>>> +33(0)430192401
>>>>>>>>> 0000000000000000000000000000000000000000000000000000000000000000000000000000
>>>>>>>>>
>>>>>>>>> On Apr 16, 2013, at 3:40 PM, Sébastien Boisvert wrote:
>>>>>>>>>
>>>>>>>>>> On 15/04/13 04:40 PM, Marcelino Suzuki wrote:
>>>>>>>>>>>   Hello and thanks for the answer.
>>>>>>>>>>>
>>>>>>>>>>>   I did undestand that and the directory does not exist when I
>>>>>>>>>>> lauch
>>>>>>>>>>> the job.  I do not get this error from the fist 4 cores, but
>>>>>>>>>>> after
>>>>>>>>>>> core 5, I get the error in the output from all cores.  The
>>>>>>>>>>> weirdest
>>>>>>>>>>> is
>>>>>>>>>>> that I had these messages in the outfile of a run that
>>>>>>>>>>> completed,
>>>>>>>>>>> and
>>>>>>>>>>> that is why I was wondering if that is normal.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This is a race condition affecting version 2.1.0.
>>>>>>>>>>
>>>>>>>>>> It was fixed by this commit:
>>>>>>>>>>
>>>>>>>>>> commit 73ada528e1d17105e17df524fdb60bdd03b40560
>>>>>>>>>> Author: Sébastien Boisvert <sebastien.boisver...@ulaval.ca>
>>>>>>>>>> Date:   Fri Feb 8 23:55:34 2013 -0500
>>>>>>>>>>
>>>>>>>>>>     fix a race condition during directory probing
>>>>>>>>>>         Resolves-bug: https://github.com/sebhtml/ray/issues/125
>>>>>>>>>>     Signed-off-by: Sébastien Boisvert <sebastien.boisvert.
>>>>>>>>>> 3...@ulaval.ca>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://bitbucket.org/sebhtml/ray/commits/73ada528e1d17105e17df524fdb60bdd03b40560
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The upcoming v2.2.0 release will include this bug fix.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This bug was introduced by the mini-ranks code.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For example, if you are using a single machine with 32 cores,
>>>>>>>>>> you can use
>>>>>>>>>>
>>>>>>>>>>     mpiexec -n 32 Ray ...
>>>>>>>>>>
>>>>>>>>>> or you can use
>>>>>>>>>>
>>>>>>>>>>     mpiexec -n 4 Ray -mini-ranks-per-rank 7 ...
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Mini-ranks are experimental, but they seems to work nicely on
>>>>>>>>>> SMP
>>>>>>>>>> NUMA machines.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>   Marcelino
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Apr 15, 2013, at 10:14 PM, Sébastien Boisvert wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On 14/04/13 04:45 PM, Marcelino Suzuki wrote:
>>>>>>>>>>>>>         Hello Sebastien
>>>>>>>>>>>>>
>>>>>>>>>>>>>         I am having issues with assembling a 3.2 M read ion
>>>>>>>>>>>>> torrent
>>>>>>>>>>>>> dataset
>>>>>>>>>>>>> using Loadleveler for an cluster intel iDataplex, with the
>>>>>>>>>>>>> mpi
>>>>>>>>>>>>> job
>>>>>>>>>>>>> crashing during execution.  It might be an issue with the
>>>>>>>>>>>>> cluster
>>>>>>>>>>>>> (I
>>>>>>>>>>>>> got this message from the system administrator: En ce
>>>>>>>>>>>>> moment
>>>>>>>>>>>>> on a
>>>>>>>>>>>>> des
>>>>>>>>>>>>> soucis de réseau IB et de baie de disque), but since I've
>>>>>>>>>>>>> seen a
>>>>>>>>>>>>> Error
>>>>>>>>>>>>> in the output files coming from some cores using the -
>>>>>>>>>>>>> output-
>>>>>>>>>>>>> filename
>>>>>>>>>>>>> option, and just want to make sure I am not missing some
>>>>>>>>>>>>> detail:
>>>>>>>>>>>>>
>>>>>>>>>>>>>         Error, crambe/ already exists, change the -o
>>>>>>>>>>>>> parameter to
>>>>>>>>>>>>> another
>>>>>>>>>>>>> value
>>>>>>>>>>>>>
>>>>>>>>>>>>>         crambe is my -o directory
>>>>>>>>>>>>>
>>>>>>>>>>>>>         Is that normal behavior?  I noticed I got the same
>>>>>>>>>>>>> errors in a
>>>>>>>>>>>>> outfile of the same assembly that did go through to the
>>>>>>>>>>>>> end.
>>>>>>>>>>>>
>>>>>>>>>>>> Ray will not overwrite an existing directory.
>>>>>>>>>>>>
>>>>>>>>>>>> You must change use another directory name.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>         Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>>         Marcelino
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> =
>>>>>>>>>>>>> ===========================================================
>>>>>>>>>>>>>               oOOOOo             Marcelino Suzuki,  Assoc.
>>>>>>>>>>>>> Professor,
>>>>>>>>>>>>> Intl.  Chair, Platform Responsible
>>>>>>>>>>>>>             oOOO              Univ Pierre Marie Curie (Paris
>>>>>>>>>>>>> 6) -
>>>>>>>>>>>>> Observatoire Océanologique de Banyuls
>>>>>>>>>>>>>          oOOOOOo.                       UMR 7621 -
>>>>>>>>>>>>> Laboratoire
>>>>>>>>>>>>> d'Océanographie Microbienne (LOMIC)
>>>>>>>>>>>>>       .oOOOOOOOo.                        Marine Biodiversity
>>>>>>>>>>>>> and
>>>>>>>>>>>>> Biotechnology (bio2mar) Platform
>>>>>>>>>>>>>     .oOOOOOOOOOoo.      suz...@obs-banyuls.fr    http://
>>>>>>>>>>>>> bit.ly/
>>>>>>>>>>>>> fq3nbE
>>>>>>>>>>>>> bio2mar.obs-banyuls.fr
>>>>>>>>>>>>> .oOOOOOOOOOOOooooo.   Ave du Fontaulé, Banyuls-sur-Mer
>>>>>>>>>>>>> 66650,
>>>>>>>>>>>>> France
>>>>>>>>>>>>> +33(0)430192401
>>>>>>>>>>>>> 0000000000000000000000000000000000000000000000000000000000000000000000000000
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>>>>> Precog is a next-generation analytics platform capable of
>>>>>>>>>>>>> advanced
>>>>>>>>>>>>> analytics on semi-structured data. The platform includes
>>>>>>>>>>>>> APIs
>>>>>>>>>>>>> for
>>>>>>>>>>>>> building
>>>>>>>>>>>>> apps and a phenomenal toolset for data science. Developers
>>>>>>>>>>>>> can
>>>>>>>>>>>>> use
>>>>>>>>>>>>> our toolset for easy data analysis & visualization. Get a
>>>>>>>>>>>>> free
>>>>>>>>>>>>> account!
>>>>>>>>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Denovoassembler-users mailing list
>>>>>>>>>>>>> Denovoassembler-users@lists.sourceforge.net
>>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-
>>>>>>>>>>>>> users
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>>>> Precog is a next-generation analytics platform capable of
>>>>>>>>>>>> advanced
>>>>>>>>>>>> analytics on semi-structured data. The platform includes
>>>>>>>>>>>> APIs
>>>>>>>>>>>> for
>>>>>>>>>>>> building
>>>>>>>>>>>> apps and a phenomenal toolset for data science. Developers
>>>>>>>>>>>> can
>>>>>>>>>>>> use
>>>>>>>>>>>> our toolset for easy data analysis & visualization. Get a
>>>>>>>>>>>> free
>>>>>>>>>>>> account!
>>>>>>>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Denovoassembler-users mailing list
>>>>>>>>>>>> Denovoassembler-users@lists.sourceforge.net
>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to