On 19/04/13 02:46 PM, Marcelino Suzuki wrote: > Hi Sebastien > > Well, I was able to go though with the assembly with ray2.2.0 and hit > the wall with 1 node 9 ranks and 2 nodes 16 ranks during > BiologicalAbundances. I am still having issues with multiple nodes/ > cores. I tried several combinations and the process gets killed after > ~30 min. I think you call raks the number of processes you send via > mpirun -np correct? .
Yes, a MPI (message passing interface) rank is usually a process running the application. > Pehaps you might habe some guesses to why. The > message error I got with 3 nodes, and 25 cores (llsubmit # @ > total_tasks = 25) is: > > mpirun: killing job... > > -------------------------------------------------------------------------- > mpirun was unable to cleanly terminate the daemons on the nodes shown > below. Additional manual cleanup may be required - please refer to > the "orte-clean" tool for assistance. > -------------------------------------------------------------------------- > node017 > node028 > [node028:03832] [[28276,0],2] routed:binomial: Connection to lifeline > [[28276,0],0] lost > [node017:03879] [[28276,0],1] routed:binomial: Connection to lifeline > [[28276,0],0] lost > This is sometimes a byproduct of the node a node that is running out of memory. > Any help very welcome > > Marcelino > > > On Apr 17, 2013, at 1:57 PM, Sébastien Boisvert wrote: > >> See my responses below. >> >> On 16/04/13 06:16 PM, Marcelino Suzuki wrote: >>> Hurrah it compiled, but could you please take a look at >>> "HELP!" below >>> >>> >>> Well, after I picked the intel compiler and added the - >>> DMPICH_IGNORE_CXX_SEEK option in the Makefile of RayPlatform >>> >>> $(Q)$(MPICXX) $(CXXFLAGS) -DMPICH_IGNORE_CXX_SEEK -D >>> RAYPLATFORM_VERSION=\"$(RAYPLATFORM_VERSION)\" -I. -c -o $@ $< >>> >>> and the Makefile of ray >>> >>> find . -name *Makefile* | xargs -i sed -i "s/mpicxx/mpicxx - >>> DMPICH_IGNORE_CXX_SEEK/g" {} >>> >>> It did compile through. >>> >>> >>> >>> === HELP! >>> >>> I will tried to run jobs with the newly compiled 2.2.0 but >>> unfortunately I still get the "Error, crambe/ already exists, change >>> the -o parameter to another value." error >>> >>> I get two types of errors: >>> >>> Whenever I start one node with 9 cores it crashes after >>> some 20 >>> minuts during assembly, so I am guessing it is some RAM issue, >>> although I am "only" running 3.2 M sequences. I do get the "Error, >>> crambe/ already exists, change the -o parameter to another value. " >>> but it does not seem to affect it. Actually when I do individual >>> outfiles per core, it only one of the files gets written to, >>> >>> >>> When I spread among different nodes i.e. 4 nodes with 4 >>> jobs per >> >> You mean 4 MPI ranks per node, right ? >> >>> node, the job cancels within a minute or so, and individual outfiles >>> for most of cores have the folowing last lines: >>> >>> Enabling CorePlugin 'TaxonomyViewer' >>> Enabling memory usage reporting. >>> Error, crambe/ already exists, change the -o parameter to another >>> value. >>> >> >> What is your file system ? And is it shared between your nodes ? >> >>> Rank 0: assembler memory usage: 37992 KiB >>> Rank 0: assembler memory usage: 103716 KiB >>> Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 28047 >>> >>> Nodes (08,11) have no errors and finish by >>> >>> Rank 0 is loading sequence reads >>> Rank 0 : partition is [0;3602181], 3602182 sequence reads >>> Rank 0 is fetching file /scratch/suzukim/mira/CCB1a.fastq with lazy >>> loading (please wait...) >>> Rank 0 has 0 sequence reads >>> Rank 0: assembler memory usage: 142416 KiB >>> Rank 0 has 100000 sequence reads >>> Rank 0: assembler memory usage: 142416 KiB >>> Rank 0 has 200000 sequence reads >>> Rank 0: assembler memory usage: 142416 KiB >>> >>> Nodes 09,10, have no errors and finish by >>> >>> Rank 0 is loading sequence reads >>> Rank 0 : partition is [0;3602181], 3602182 sequence reads >>> Rank 0 is fetching file /scratch/suzukim/mira/CCB1a.fastq with lazy >>> loading (please wait...) >>> >>> Marcelino >>> >>> On Apr 16, 2013, at 10:14 PM, Sébastien Boisvert wrote: >>> >>>> To compile the git version: >>>> >>>> >>>> git clone git://github.com/sebhtml/ray.git >>>> git clone git://github.com/sebhtml/RayPlatform.git >>>> >>>> cd ray >>>> make >>>> >>>> On 16/04/13 04:09 PM, Marcelino Suzuki wrote: >>>>> Hello and thanks >>>>> >>>>> Just that you know, I am getting rhe same type of >>>>> compilcation >>>>> errors with other .cpp files as well (after I excluded Amos.cpp) >>>>> code/ >>>>> plugin_CoverageGatherer/CoverageGatherer.cpp (at least, perhaps >>>>> others) >>>>> >>>>> I I will wait for the new version, and will try in a different >>>>> cluster where I compiled with gcc. >>>>> >>>>> Marcelino >>>>> >>>>> On Apr 16, 2013, at 10:00 PM, Sébastien Boisvert wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> v2.2.0 should be released soon. There is one ticket left. >>>>>> >>>>>> Otherwise, I don't think this issue is critical as it does not >>>>>> prevent >>>>>> assemblies for completing. >>>>>> >>>>>> On 16/04/13 03:35 PM, Marcelino Suzuki wrote: >>>>>>> Hello again >>>>>>> >>>>>>> Well, I am not really know how to add the changes using git >>>>>>> (other >>>>>>> than editing the .cpp files from V2.1.0, which are not quite the >>>>>>> same >>>>>>> as the ones in the commit. I tried to clone the latest ray and >>>>>>> RayPlatform, and after some tweaeking to use the intel compiler >>>>>>> (suggested by my sys admin) and the compilation crashed at >>>>>>> plugin_Amos.cpp. Maybe you can give me some pointers on how to >>>>>>> fix >>>>>>> 2.1.0 (which I was able to compile) >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> commands >>>>>>> cd >>>>>>> git clone https://bitbucket.org/sebhtml/ray.git >>>>>>> mv ray /work/OOBMECO/bin >>>>>>> git clone >>>>>>> https://bitbucket.org/sebhtml/rayplatform.git >>>>>>> mv rayplatform/ /work/OOBMECO/bin >>>>>>> cd /work/OOBMECO/bin/ray/ >>>>>>> source /opt/cluster/gcc-4.4.3/refresh.sh >>>>>>> source /opt/cluster/compilers/intel/Compiler/ >>>>>>> 11.1/072/ >>>>>>> bin/iccvars.sh intel64 >>>>>>> source /opt/cluster/compilers/intel/impi/ >>>>>>> 4.0.0.028/ >>>>>>> bin64/mpivars.sh >>>>>>> find . -name *Makefile* | xargs -i sed -i "s/ >>>>>>> mpicxx/ >>>>>>> mpicxx -DMPICH_IGNORE_CXX_SEEK/g" {} >>>>>>> >>>>>>> edited the RayPlatform Makefile and add - >>>>>>> DMPICH_IGNORE_CXX_SEEK >>>>>>> >>>>>>> make PREFIX=$PWD >>>>>>> >>>>>>> ERROR: >>>>>>> >>>>>>> CXX code/plugin_Amos/Amos.o >>>>>>> code/plugin_Amos/Amos.cpp: In member function â: >>>>>>> code/plugin_Amos/Amos.cpp:239: error: expected primary-expression >>>>>>> before â token >>>>>>> code/plugin_Amos/Amos.cpp:239: error: â was not declared in this >>>>>>> scope >>>>>>> code/plugin_Amos/Amos.cpp:240: error: expected primary-expression >>>>>>> before â token >>>>>>> code/plugin_Amos/Amos.cpp:240: error: â was not declared in this >>>>>>> scope >>>>>>> make: *** [code/plugin_Amos/Amos.o] Error 1 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> = >>>>>>> = >>>>>>> = >>>>>>> = >>>>>>> = >>>>>>> = >>>>>>> = >>>>>>> = >>>>>>> = >>>>>>> = >>>>>>> = >>>>>>> ================================================================= >>>>>>> oOOOOo Marcelino Suzuki, Assoc. >>>>>>> Professor, >>>>>>> Intl. Chair, Platform Responsible >>>>>>> oOOO Univ Pierre Marie Curie (Paris 6) - >>>>>>> Observatoire Océanologique de Banyuls >>>>>>> oOOOOOo. UMR 7621 - Laboratoire >>>>>>> d'Océanographie Microbienne (LOMIC) >>>>>>> .oOOOOOOOo. Marine Biodiversity and >>>>>>> Biotechnology (bio2mar) Platform >>>>>>> .oOOOOOOOOOoo. suz...@obs-banyuls.fr http://bit.ly/ >>>>>>> fq3nbE >>>>>>> bio2mar.obs-banyuls.fr >>>>>>> .oOOOOOOOOOOOooooo. Ave du Fontaulé, Banyuls-sur-Mer 66650, >>>>>>> France >>>>>>> +33(0)430192401 >>>>>>> 0000000000000000000000000000000000000000000000000000000000000000000000000000 >>>>>>> >>>>>>> On Apr 16, 2013, at 3:40 PM, Sébastien Boisvert wrote: >>>>>>> >>>>>>>> On 15/04/13 04:40 PM, Marcelino Suzuki wrote: >>>>>>>>> Hello and thanks for the answer. >>>>>>>>> >>>>>>>>> I did undestand that and the directory does not exist when I >>>>>>>>> lauch >>>>>>>>> the job. I do not get this error from the fist 4 cores, but >>>>>>>>> after >>>>>>>>> core 5, I get the error in the output from all cores. The >>>>>>>>> weirdest >>>>>>>>> is >>>>>>>>> that I had these messages in the outfile of a run that >>>>>>>>> completed, >>>>>>>>> and >>>>>>>>> that is why I was wondering if that is normal. >>>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> >>>>>>>> This is a race condition affecting version 2.1.0. >>>>>>>> >>>>>>>> It was fixed by this commit: >>>>>>>> >>>>>>>> commit 73ada528e1d17105e17df524fdb60bdd03b40560 >>>>>>>> Author: Sébastien Boisvert <sebastien.boisver...@ulaval.ca> >>>>>>>> Date: Fri Feb 8 23:55:34 2013 -0500 >>>>>>>> >>>>>>>> fix a race condition during directory probing >>>>>>>> Resolves-bug: https://github.com/sebhtml/ray/issues/125 >>>>>>>> Signed-off-by: Sébastien Boisvert <sebastien.boisvert. >>>>>>>> 3...@ulaval.ca> >>>>>>>> >>>>>>>> >>>>>>>> https://bitbucket.org/sebhtml/ray/commits/73ada528e1d17105e17df524fdb60bdd03b40560 >>>>>>>> >>>>>>>> >>>>>>>> The upcoming v2.2.0 release will include this bug fix. >>>>>>>> >>>>>>>> >>>>>>>> This bug was introduced by the mini-ranks code. >>>>>>>> >>>>>>>> >>>>>>>> For example, if you are using a single machine with 32 cores, >>>>>>>> you can use >>>>>>>> >>>>>>>> mpiexec -n 32 Ray ... >>>>>>>> >>>>>>>> or you can use >>>>>>>> >>>>>>>> mpiexec -n 4 Ray -mini-ranks-per-rank 7 ... >>>>>>>> >>>>>>>> >>>>>>>> Mini-ranks are experimental, but they seems to work nicely on >>>>>>>> SMP >>>>>>>> NUMA machines. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Marcelino >>>>>>>>> >>>>>>>>> >>>>>>>>> On Apr 15, 2013, at 10:14 PM, Sébastien Boisvert wrote: >>>>>>>>> >>>>>>>>>> On 14/04/13 04:45 PM, Marcelino Suzuki wrote: >>>>>>>>>>> Hello Sebastien >>>>>>>>>>> >>>>>>>>>>> I am having issues with assembling a 3.2 M read ion >>>>>>>>>>> torrent >>>>>>>>>>> dataset >>>>>>>>>>> using Loadleveler for an cluster intel iDataplex, with the >>>>>>>>>>> mpi >>>>>>>>>>> job >>>>>>>>>>> crashing during execution. It might be an issue with the >>>>>>>>>>> cluster >>>>>>>>>>> (I >>>>>>>>>>> got this message from the system administrator: En ce moment >>>>>>>>>>> on a >>>>>>>>>>> des >>>>>>>>>>> soucis de réseau IB et de baie de disque), but since I've >>>>>>>>>>> seen a >>>>>>>>>>> Error >>>>>>>>>>> in the output files coming from some cores using the -output- >>>>>>>>>>> filename >>>>>>>>>>> option, and just want to make sure I am not missing some >>>>>>>>>>> detail: >>>>>>>>>>> >>>>>>>>>>> Error, crambe/ already exists, change the -o >>>>>>>>>>> parameter to >>>>>>>>>>> another >>>>>>>>>>> value >>>>>>>>>>> >>>>>>>>>>> crambe is my -o directory >>>>>>>>>>> >>>>>>>>>>> Is that normal behavior? I noticed I got the same >>>>>>>>>>> errors in a >>>>>>>>>>> outfile of the same assembly that did go through to the end. >>>>>>>>>> >>>>>>>>>> Ray will not overwrite an existing directory. >>>>>>>>>> >>>>>>>>>> You must change use another directory name. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> Marcelino >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> ============================================================= >>>>>>>>>>> oOOOOo Marcelino Suzuki, Assoc. >>>>>>>>>>> Professor, >>>>>>>>>>> Intl. Chair, Platform Responsible >>>>>>>>>>> oOOO Univ Pierre Marie Curie (Paris >>>>>>>>>>> 6) - >>>>>>>>>>> Observatoire Océanologique de Banyuls >>>>>>>>>>> oOOOOOo. UMR 7621 - Laboratoire >>>>>>>>>>> d'Océanographie Microbienne (LOMIC) >>>>>>>>>>> .oOOOOOOOo. Marine Biodiversity >>>>>>>>>>> and >>>>>>>>>>> Biotechnology (bio2mar) Platform >>>>>>>>>>> .oOOOOOOOOOoo. suz...@obs-banyuls.fr http:// >>>>>>>>>>> bit.ly/ >>>>>>>>>>> fq3nbE >>>>>>>>>>> bio2mar.obs-banyuls.fr >>>>>>>>>>> .oOOOOOOOOOOOooooo. Ave du Fontaulé, Banyuls-sur-Mer 66650, >>>>>>>>>>> France >>>>>>>>>>> +33(0)430192401 >>>>>>>>>>> 0000000000000000000000000000000000000000000000000000000000000000000000000000 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>>> Precog is a next-generation analytics platform capable of >>>>>>>>>>> advanced >>>>>>>>>>> analytics on semi-structured data. The platform includes APIs >>>>>>>>>>> for >>>>>>>>>>> building >>>>>>>>>>> apps and a phenomenal toolset for data science. Developers >>>>>>>>>>> can >>>>>>>>>>> use >>>>>>>>>>> our toolset for easy data analysis & visualization. Get a >>>>>>>>>>> free >>>>>>>>>>> account! >>>>>>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Denovoassembler-users mailing list >>>>>>>>>>> Denovoassembler-users@lists.sourceforge.net >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler- >>>>>>>>>>> users >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>> Precog is a next-generation analytics platform capable of >>>>>>>>>> advanced >>>>>>>>>> analytics on semi-structured data. The platform includes APIs >>>>>>>>>> for >>>>>>>>>> building >>>>>>>>>> apps and a phenomenal toolset for data science. Developers can >>>>>>>>>> use >>>>>>>>>> our toolset for easy data analysis & visualization. Get a free >>>>>>>>>> account! >>>>>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter >>>>>>>>>> _______________________________________________ >>>>>>>>>> Denovoassembler-users mailing list >>>>>>>>>> Denovoassembler-users@lists.sourceforge.net >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users