On 20/04/13 04:18 AM, Marcelino Suzuki wrote: > Hi Sebastien > > One last question (I hope for the moment) > > For the runs that work for me (2 nodes 16 cores), whenever I ask for > the -output-filename, I have quite different outputs per rank (i.e. > two of the outfiles output the progress while the rest dont change and > ouput the following lines: > > -s (single sequences) > Sequences: /scratch/suzukim/mira/CCB1a.fastq > Enabling CorePlugin 'TaxonomyViewer' > Enabling memory usage reporting. > Error, crambe/ already exists, change the -o parameter to another value. > > Rank 0: assembler memory usage: 37988 KiB > Rank 0: assembler memory usage: 103712 KiB > Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 8169 > > Is this behavior "normal"? or does it mean I am only running the > calculations in two cores? If that is the case, I could try to assign > a rank per node, but since I have been having errors whenever I use > more that two nodes, I rather not. >
Can you provide your command line for your 2-node job for which you are experiencing problems ? Also, which version are you using ? v2.2.0 has a lot of bug fixes under the hood. > Thanks much! > > Marcelino > > = > = > = > = > ======================================================================== > oOOOOo Marcelino Suzuki, Assoc. Professor, > Intl. Chair, Platform Responsible > oOOO Univ Pierre Marie Curie (Paris 6) - > Observatoire Océanologique de Banyuls > oOOOOOo. UMR 7621 - Laboratoire > d'Océanographie Microbienne (LOMIC) > .oOOOOOOOo. Marine Biodiversity and > Biotechnology (bio2mar) Platform > .oOOOOOOOOOoo. suz...@obs-banyuls.fr http://bit.ly/fq3nbE > bio2mar.obs-banyuls.fr > .oOOOOOOOOOOOooooo. Ave du Fontaulé, Banyuls-sur-Mer 66650, France > +33(0)430192401 > 0000000000000000000000000000000000000000000000000000000000000000000000000000 > > On Apr 19, 2013, at 10:52 PM, Sébastien Boisvert wrote: > >> On 19/04/13 02:46 PM, Marcelino Suzuki wrote: >>> Hi Sebastien >>> >>> Well, I was able to go though with the assembly with >>> ray2.2.0 and hit >>> the wall with 1 node 9 ranks and 2 nodes 16 ranks during >>> BiologicalAbundances. I am still having issues with multiple nodes/ >>> cores. I tried several combinations and the process gets killed >>> after >>> ~30 min. I think you call raks the number of processes you send via >>> mpirun -np correct? . >> >> Yes, a MPI (message passing interface) rank is usually a process >> running the application. >> >>> Pehaps you might habe some guesses to why. The >>> message error I got with 3 nodes, and 25 cores (llsubmit # @ >>> total_tasks = 25) is: >>> >>> mpirun: killing job... >>> >>> -------------------------------------------------------------------------- >>> mpirun was unable to cleanly terminate the daemons on the nodes shown >>> below. Additional manual cleanup may be required - please refer to >>> the "orte-clean" tool for assistance. >>> -------------------------------------------------------------------------- >>> node017 >>> node028 >>> [node028:03832] [[28276,0],2] routed:binomial: Connection to lifeline >>> [[28276,0],0] lost >>> [node017:03879] [[28276,0],1] routed:binomial: Connection to lifeline >>> [[28276,0],0] lost >>> >> >> This is sometimes a byproduct of the node a node that is running out >> of memory. >> >>> Any help very welcome >>> >>> Marcelino >>> >>> >>> On Apr 17, 2013, at 1:57 PM, Sébastien Boisvert wrote: >>> >>>> See my responses below. >>>> >>>> On 16/04/13 06:16 PM, Marcelino Suzuki wrote: >>>>> Hurrah it compiled, but could you please take a look at >>>>> "HELP!" below >>>>> >>>>> >>>>> Well, after I picked the intel compiler and added the - >>>>> DMPICH_IGNORE_CXX_SEEK option in the Makefile of RayPlatform >>>>> >>>>> $(Q)$(MPICXX) $(CXXFLAGS) -DMPICH_IGNORE_CXX_SEEK -D >>>>> RAYPLATFORM_VERSION=\"$(RAYPLATFORM_VERSION)\" -I. -c -o $@ $< >>>>> >>>>> and the Makefile of ray >>>>> >>>>> find . -name *Makefile* | xargs -i sed -i "s/mpicxx/ >>>>> mpicxx - >>>>> DMPICH_IGNORE_CXX_SEEK/g" {} >>>>> >>>>> It did compile through. >>>>> >>>>> >>>>> >>>>> === HELP! >>>>> >>>>> I will tried to run jobs with the newly compiled 2.2.0 but >>>>> unfortunately I still get the "Error, crambe/ already exists, >>>>> change >>>>> the -o parameter to another value." error >>>>> >>>>> I get two types of errors: >>>>> >>>>> Whenever I start one node with 9 cores it crashes after >>>>> some 20 >>>>> minuts during assembly, so I am guessing it is some RAM issue, >>>>> although I am "only" running 3.2 M sequences. I do get the "Error, >>>>> crambe/ already exists, change the -o parameter to another value. " >>>>> but it does not seem to affect it. Actually when I do individual >>>>> outfiles per core, it only one of the files gets written to, >>>>> >>>>> >>>>> When I spread among different nodes i.e. 4 nodes with 4 >>>>> jobs per >>>> >>>> You mean 4 MPI ranks per node, right ? >>>> >>>>> node, the job cancels within a minute or so, and individual >>>>> outfiles >>>>> for most of cores have the folowing last lines: >>>>> >>>>> Enabling CorePlugin 'TaxonomyViewer' >>>>> Enabling memory usage reporting. >>>>> Error, crambe/ already exists, change the -o parameter to another >>>>> value. >>>>> >>>> >>>> What is your file system ? And is it shared between your nodes ? >>>> >>>>> Rank 0: assembler memory usage: 37992 KiB >>>>> Rank 0: assembler memory usage: 103716 KiB >>>>> Rank 0: Rank= 0 Size= 1 ProcessIdentifier= 28047 >>>>> >>>>> Nodes (08,11) have no errors and finish by >>>>> >>>>> Rank 0 is loading sequence reads >>>>> Rank 0 : partition is [0;3602181], 3602182 sequence reads >>>>> Rank 0 is fetching file /scratch/suzukim/mira/CCB1a.fastq with lazy >>>>> loading (please wait...) >>>>> Rank 0 has 0 sequence reads >>>>> Rank 0: assembler memory usage: 142416 KiB >>>>> Rank 0 has 100000 sequence reads >>>>> Rank 0: assembler memory usage: 142416 KiB >>>>> Rank 0 has 200000 sequence reads >>>>> Rank 0: assembler memory usage: 142416 KiB >>>>> >>>>> Nodes 09,10, have no errors and finish by >>>>> >>>>> Rank 0 is loading sequence reads >>>>> Rank 0 : partition is [0;3602181], 3602182 sequence reads >>>>> Rank 0 is fetching file /scratch/suzukim/mira/CCB1a.fastq with lazy >>>>> loading (please wait...) >>>>> >>>>> Marcelino >>>>> >>>>> On Apr 16, 2013, at 10:14 PM, Sébastien Boisvert wrote: >>>>> >>>>>> To compile the git version: >>>>>> >>>>>> >>>>>> git clone git://github.com/sebhtml/ray.git >>>>>> git clone git://github.com/sebhtml/RayPlatform.git >>>>>> >>>>>> cd ray >>>>>> make >>>>>> >>>>>> On 16/04/13 04:09 PM, Marcelino Suzuki wrote: >>>>>>> Hello and thanks >>>>>>> >>>>>>> Just that you know, I am getting rhe same type of >>>>>>> compilcation >>>>>>> errors with other .cpp files as well (after I excluded Amos.cpp) >>>>>>> code/ >>>>>>> plugin_CoverageGatherer/CoverageGatherer.cpp (at least, perhaps >>>>>>> others) >>>>>>> >>>>>>> I I will wait for the new version, and will try in a >>>>>>> different >>>>>>> cluster where I compiled with gcc. >>>>>>> >>>>>>> Marcelino >>>>>>> >>>>>>> On Apr 16, 2013, at 10:00 PM, Sébastien Boisvert wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> v2.2.0 should be released soon. There is one ticket left. >>>>>>>> >>>>>>>> Otherwise, I don't think this issue is critical as it does not >>>>>>>> prevent >>>>>>>> assemblies for completing. >>>>>>>> >>>>>>>> On 16/04/13 03:35 PM, Marcelino Suzuki wrote: >>>>>>>>> Hello again >>>>>>>>> >>>>>>>>> Well, I am not really know how to add the changes using git >>>>>>>>> (other >>>>>>>>> than editing the .cpp files from V2.1.0, which are not quite >>>>>>>>> the >>>>>>>>> same >>>>>>>>> as the ones in the commit. I tried to clone the latest ray and >>>>>>>>> RayPlatform, and after some tweaeking to use the intel compiler >>>>>>>>> (suggested by my sys admin) and the compilation crashed at >>>>>>>>> plugin_Amos.cpp. Maybe you can give me some pointers on how to >>>>>>>>> fix >>>>>>>>> 2.1.0 (which I was able to compile) >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> commands >>>>>>>>> cd >>>>>>>>> git clone https://bitbucket.org/sebhtml/ >>>>>>>>> ray.git >>>>>>>>> mv ray /work/OOBMECO/bin >>>>>>>>> git clone >>>>>>>>> https://bitbucket.org/sebhtml/rayplatform.git >>>>>>>>> mv rayplatform/ /work/OOBMECO/bin >>>>>>>>> cd /work/OOBMECO/bin/ray/ >>>>>>>>> source /opt/cluster/gcc-4.4.3/refresh.sh >>>>>>>>> source /opt/cluster/compilers/intel/Compiler/ >>>>>>>>> 11.1/072/ >>>>>>>>> bin/iccvars.sh intel64 >>>>>>>>> source /opt/cluster/compilers/intel/impi/ >>>>>>>>> 4.0.0.028/ >>>>>>>>> bin64/mpivars.sh >>>>>>>>> find . -name *Makefile* | xargs -i sed -i "s/ >>>>>>>>> mpicxx/ >>>>>>>>> mpicxx -DMPICH_IGNORE_CXX_SEEK/g" {} >>>>>>>>> >>>>>>>>> edited the RayPlatform Makefile and add - >>>>>>>>> DMPICH_IGNORE_CXX_SEEK >>>>>>>>> >>>>>>>>> make PREFIX=$PWD >>>>>>>>> >>>>>>>>> ERROR: >>>>>>>>> >>>>>>>>> CXX code/plugin_Amos/Amos.o >>>>>>>>> code/plugin_Amos/Amos.cpp: In member function â: >>>>>>>>> code/plugin_Amos/Amos.cpp:239: error: expected primary- >>>>>>>>> expression >>>>>>>>> before â token >>>>>>>>> code/plugin_Amos/Amos.cpp:239: error: â was not declared in >>>>>>>>> this >>>>>>>>> scope >>>>>>>>> code/plugin_Amos/Amos.cpp:240: error: expected primary- >>>>>>>>> expression >>>>>>>>> before â token >>>>>>>>> code/plugin_Amos/Amos.cpp:240: error: â was not declared in >>>>>>>>> this >>>>>>>>> scope >>>>>>>>> make: *** [code/plugin_Amos/Amos.o] Error 1 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> = >>>>>>>>> = >>>>>>>>> = >>>>>>>>> = >>>>>>>>> = >>>>>>>>> = >>>>>>>>> = >>>>>>>>> = >>>>>>>>> = >>>>>>>>> = >>>>>>>>> = >>>>>>>>> = >>>>>>>>> = >>>>>>>>> =============================================================== >>>>>>>>> oOOOOo Marcelino Suzuki, Assoc. >>>>>>>>> Professor, >>>>>>>>> Intl. Chair, Platform Responsible >>>>>>>>> oOOO Univ Pierre Marie Curie (Paris >>>>>>>>> 6) - >>>>>>>>> Observatoire Océanologique de Banyuls >>>>>>>>> oOOOOOo. UMR 7621 - Laboratoire >>>>>>>>> d'Océanographie Microbienne (LOMIC) >>>>>>>>> .oOOOOOOOo. Marine Biodiversity and >>>>>>>>> Biotechnology (bio2mar) Platform >>>>>>>>> .oOOOOOOOOOoo. suz...@obs-banyuls.fr http://bit.ly/ >>>>>>>>> fq3nbE >>>>>>>>> bio2mar.obs-banyuls.fr >>>>>>>>> .oOOOOOOOOOOOooooo. Ave du Fontaulé, Banyuls-sur-Mer 66650, >>>>>>>>> France >>>>>>>>> +33(0)430192401 >>>>>>>>> 0000000000000000000000000000000000000000000000000000000000000000000000000000 >>>>>>>>> >>>>>>>>> On Apr 16, 2013, at 3:40 PM, Sébastien Boisvert wrote: >>>>>>>>> >>>>>>>>>> On 15/04/13 04:40 PM, Marcelino Suzuki wrote: >>>>>>>>>>> Hello and thanks for the answer. >>>>>>>>>>> >>>>>>>>>>> I did undestand that and the directory does not exist when I >>>>>>>>>>> lauch >>>>>>>>>>> the job. I do not get this error from the fist 4 cores, but >>>>>>>>>>> after >>>>>>>>>>> core 5, I get the error in the output from all cores. The >>>>>>>>>>> weirdest >>>>>>>>>>> is >>>>>>>>>>> that I had these messages in the outfile of a run that >>>>>>>>>>> completed, >>>>>>>>>>> and >>>>>>>>>>> that is why I was wondering if that is normal. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This is a race condition affecting version 2.1.0. >>>>>>>>>> >>>>>>>>>> It was fixed by this commit: >>>>>>>>>> >>>>>>>>>> commit 73ada528e1d17105e17df524fdb60bdd03b40560 >>>>>>>>>> Author: Sébastien Boisvert <sebastien.boisver...@ulaval.ca> >>>>>>>>>> Date: Fri Feb 8 23:55:34 2013 -0500 >>>>>>>>>> >>>>>>>>>> fix a race condition during directory probing >>>>>>>>>> Resolves-bug: https://github.com/sebhtml/ray/issues/125 >>>>>>>>>> Signed-off-by: Sébastien Boisvert <sebastien.boisvert. >>>>>>>>>> 3...@ulaval.ca> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> https://bitbucket.org/sebhtml/ray/commits/73ada528e1d17105e17df524fdb60bdd03b40560 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The upcoming v2.2.0 release will include this bug fix. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This bug was introduced by the mini-ranks code. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> For example, if you are using a single machine with 32 cores, >>>>>>>>>> you can use >>>>>>>>>> >>>>>>>>>> mpiexec -n 32 Ray ... >>>>>>>>>> >>>>>>>>>> or you can use >>>>>>>>>> >>>>>>>>>> mpiexec -n 4 Ray -mini-ranks-per-rank 7 ... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Mini-ranks are experimental, but they seems to work nicely on >>>>>>>>>> SMP >>>>>>>>>> NUMA machines. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Marcelino >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Apr 15, 2013, at 10:14 PM, Sébastien Boisvert wrote: >>>>>>>>>>> >>>>>>>>>>>> On 14/04/13 04:45 PM, Marcelino Suzuki wrote: >>>>>>>>>>>>> Hello Sebastien >>>>>>>>>>>>> >>>>>>>>>>>>> I am having issues with assembling a 3.2 M read ion >>>>>>>>>>>>> torrent >>>>>>>>>>>>> dataset >>>>>>>>>>>>> using Loadleveler for an cluster intel iDataplex, with the >>>>>>>>>>>>> mpi >>>>>>>>>>>>> job >>>>>>>>>>>>> crashing during execution. It might be an issue with the >>>>>>>>>>>>> cluster >>>>>>>>>>>>> (I >>>>>>>>>>>>> got this message from the system administrator: En ce >>>>>>>>>>>>> moment >>>>>>>>>>>>> on a >>>>>>>>>>>>> des >>>>>>>>>>>>> soucis de réseau IB et de baie de disque), but since I've >>>>>>>>>>>>> seen a >>>>>>>>>>>>> Error >>>>>>>>>>>>> in the output files coming from some cores using the - >>>>>>>>>>>>> output- >>>>>>>>>>>>> filename >>>>>>>>>>>>> option, and just want to make sure I am not missing some >>>>>>>>>>>>> detail: >>>>>>>>>>>>> >>>>>>>>>>>>> Error, crambe/ already exists, change the -o >>>>>>>>>>>>> parameter to >>>>>>>>>>>>> another >>>>>>>>>>>>> value >>>>>>>>>>>>> >>>>>>>>>>>>> crambe is my -o directory >>>>>>>>>>>>> >>>>>>>>>>>>> Is that normal behavior? I noticed I got the same >>>>>>>>>>>>> errors in a >>>>>>>>>>>>> outfile of the same assembly that did go through to the >>>>>>>>>>>>> end. >>>>>>>>>>>> >>>>>>>>>>>> Ray will not overwrite an existing directory. >>>>>>>>>>>> >>>>>>>>>>>> You must change use another directory name. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>>> Marcelino >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> = >>>>>>>>>>>>> =========================================================== >>>>>>>>>>>>> oOOOOo Marcelino Suzuki, Assoc. >>>>>>>>>>>>> Professor, >>>>>>>>>>>>> Intl. Chair, Platform Responsible >>>>>>>>>>>>> oOOO Univ Pierre Marie Curie (Paris >>>>>>>>>>>>> 6) - >>>>>>>>>>>>> Observatoire Océanologique de Banyuls >>>>>>>>>>>>> oOOOOOo. UMR 7621 - >>>>>>>>>>>>> Laboratoire >>>>>>>>>>>>> d'Océanographie Microbienne (LOMIC) >>>>>>>>>>>>> .oOOOOOOOo. Marine Biodiversity >>>>>>>>>>>>> and >>>>>>>>>>>>> Biotechnology (bio2mar) Platform >>>>>>>>>>>>> .oOOOOOOOOOoo. suz...@obs-banyuls.fr http:// >>>>>>>>>>>>> bit.ly/ >>>>>>>>>>>>> fq3nbE >>>>>>>>>>>>> bio2mar.obs-banyuls.fr >>>>>>>>>>>>> .oOOOOOOOOOOOooooo. Ave du Fontaulé, Banyuls-sur-Mer >>>>>>>>>>>>> 66650, >>>>>>>>>>>>> France >>>>>>>>>>>>> +33(0)430192401 >>>>>>>>>>>>> 0000000000000000000000000000000000000000000000000000000000000000000000000000 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>>>>> Precog is a next-generation analytics platform capable of >>>>>>>>>>>>> advanced >>>>>>>>>>>>> analytics on semi-structured data. The platform includes >>>>>>>>>>>>> APIs >>>>>>>>>>>>> for >>>>>>>>>>>>> building >>>>>>>>>>>>> apps and a phenomenal toolset for data science. Developers >>>>>>>>>>>>> can >>>>>>>>>>>>> use >>>>>>>>>>>>> our toolset for easy data analysis & visualization. Get a >>>>>>>>>>>>> free >>>>>>>>>>>>> account! >>>>>>>>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Denovoassembler-users mailing list >>>>>>>>>>>>> Denovoassembler-users@lists.sourceforge.net >>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler- >>>>>>>>>>>>> users >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>>>> Precog is a next-generation analytics platform capable of >>>>>>>>>>>> advanced >>>>>>>>>>>> analytics on semi-structured data. The platform includes >>>>>>>>>>>> APIs >>>>>>>>>>>> for >>>>>>>>>>>> building >>>>>>>>>>>> apps and a phenomenal toolset for data science. Developers >>>>>>>>>>>> can >>>>>>>>>>>> use >>>>>>>>>>>> our toolset for easy data analysis & visualization. Get a >>>>>>>>>>>> free >>>>>>>>>>>> account! >>>>>>>>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Denovoassembler-users mailing list >>>>>>>>>>>> Denovoassembler-users@lists.sourceforge.net >>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > ------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users