With the path I went from 6*120hrs(walltime was hit each time) to: resources_used.walltime=47:35:27
geez...I should have asked earlier :-) thanks Louis On 12-11-05 04:52 PM, Sébastien Boisvert wrote: > On 11/05/2012 02:32 PM, Louis Letourneau wrote: >> Darn I forgot about that bug and I saw it pass on the mailing list too. >> >> Sorry for the post then. >> > > Not sure I sent an email about that patch ;-) > >> I saw your mini-rank posting, I think it's a wonderful idea especially >> since there are more and more cores per nodes now. >> >> Is the infinit loop fixed in the mini-rank codebase? > > It is not an infinite loop, it just that a loop over a k-mer with > a coverage of 99999 (a large coverage) takes a while with all the messages. > >> If not should I just apply the patch? >> > > Yeah, the patch fixes the long running time in the scaffolding. > >> Again, thanks for the great work. >> > > Thanks for the testing ! > >> Louis >> >> On 12-11-05 12:17 PM, Sébastien Boisvert wrote: >>> On 11/05/2012 11:14 AM, Louis Letourneau wrote: >>>> I have assembled 2 >2.5G genomes (not the same, both mammals) in about >>>> 48hrs using 2025cores. This works great. >>>> >>> >>> Nice. >>> >>>> I'm trying to assemble a fish and I am having issues I don't quite know >>>> how to debug. >>>> >>>> The fish is about 1.9Gb in size and not diploid. >>>> >>>> If I run Ray using the paired + mates using k31 I was able to assemble >>>> it in 168hours (I needed to restart after 120hours because of >>>> walltime...thanks for the checkpoints :-) ). >>>> >>> >>> That's quite long, what's the latency ? >>> >>> We are working on a new programming model called "mini-ranks" to better use >>> super computer with a lot of nodes, but also with a lot of cores per node. >>> >>> Ray uses RayPlatform, and RayPlatform uses MPI. In the new model, >>> RayPlatform >>> uses "mini-ranks". >>> >>> The current model in RayPlatform is to use pure MPI programming model, which >>> can be really bad on some super computer if there is just one network card >>> on each node that must serve lots of MPI processes. >>> >>> If you are interested, we have a experimental branch called for mini-ranks >>> that >>> can use only 1 MPI process per node, and as many IEEE POSIX threads (one >>> for each >>> mini-ranks). >>> >>> With mini-ranks, the routing code in RayPlatform will become obsolete ! >>> >>> Some latency results: >>> >>> >>> Table 1: Comparison of MPI ranks with mini-ranks on the Colosse >>> super-computer at Laval University. >>> +-------+---------------------------------------------------+ >>> | Cores | Average round-trip latency (us) | >>> +-------+-----------------------+---------------------------+ >>> | | MPI ranks | mini-ranks | >>> | | (pure MPI) | (MPI + pthread) | >>> +-------+-----------------------+---------------------------+ >>> | 8 | 11.25 +/- 0 | 24.1429 +/- 0 | >>> | 16 | 35.875 +/- 6.92369 | 43.0179 +/- 8.76275 | >>> | 32 | 66.3125 +/- 6.76387 | 41.7143 +/- 1.23924 | >>> | 64 | 90 +/- 16.5265 | 37.75 +/- 6.41984 | >>> | 128 | 126.562 +/- 25.0116 | 43.0179 +/- 8.76275 | >>> | 256 | 203.637 +/- 67.4579 | 44.6429 +/- 6.11862 | >>> +-------+-----------------------+---------------------------+ >>> >>> If you want to try that: >>> >>> git clone g...@github.com:sebhtml/RayPlatform.git >>> cd RayPlatform;git checkout minirank-model;cd .. >>> git clone g...@github.com:sebhtml/ray.git >>> cd ray;git checkout minirank-model; >>> make >>> >>> then, to run on 100 nodes, with 24 cores per node: >>> >>> >>> mpiexec -n 100 -bynode Ray -mini-ranks-per-rank 23 \ >>> ... >>> >>> Notes: >>> >>> 1. the -bynode is necessary in Open-MPI because the default is -byslot. >>> -byslot will work also if the job scheduler presents the slots in a by-node >>> round-robin strategy. >>> >>> 2. It is important to throw 23 mini-ranks per MPI process and not 24 >>> because each MPI process has a communication thread too and you >>> don't want to oversubscribe the folks on the CPU at all. >>> >>> 3. The mini-rank code contains 0 (zero) locks, 0 mutexes, 0 spinlocks, 0 >>> semaphore. >>> The code is non-blocking, and lock-free which is why it works so well. >>> >>> 4. This work should be merged once I have made additional sanity checks. >>> >>> 5. If you want to look at the code, the class MessageQueue is particularly >>> interesting. >>> >>>> It worked (although the assembly wasn't great, possibly due to a lot of >>>> repeats), but took way longer than the bigger genomes. >>>> >>>> I'm trying the same without the mates. I also changed the kmer from k31 >>>> to k61. >>>> >>>> I hit walltime 5 times now, 120hours each and it's not finished. >>>> >>>> The variables that changed are kmer and no mates. >>>> >>>> The first run ran many steps in the log. >>>> Since the first wall time, the only output I seem to be having >>>> >>>> Rank X: gathering scaffold links [Y/2987] [Z/7166] >>>> >>>> (X,Y,Z varies of course) >>>> >>> >>> Known bug where the Ray stalls on repeats too long... >>> >>> https://github.com/sebhtml/ray/issues/91 >>> >>> This is because of a bug (1 month old, actually). I have a patch in the >>> queue, >>> but I am not satisfied by its impact overall. The patch fixes the problem >>> of running >>> time though. I will solves this bug in the scaffolder when I have time. >>> Meanwhile, you can use the patch, which solves the problem, but it's a >>> dirty hack. >>> >>> You can test this patch: >>> >>> wget >>> http://downloads.sourceforge.net/project/denovoassembler/Ray-v2.1.0.tar.bz2 >>> tar -xjf Ray-v2.1.0.tar.bz2 >>> cd Ray-v2.1.0 >>> wget >>> https://github.com/sebhtml/patches/raw/master/ray/human-seb-from-13efb22270e4f563c9cafc.patc >>> patch -p1 < human-seb-from-13efb22270e4f563c9cafc.patch >>> >>> make ... >>> >>>> I was using a version compiled from sources for the polytope routing. >>>> >>> >>> As I said, "mini-ranks" *will* supercede the virtual routing subsystem. The >>> problem with >>> virtual routing is that it increases the number of physical hops. With >>> mini-ranks, >>> it is not the case at all. >>> >>>> Any ideas? >>>> >>> >>> To wrap-up: >>> >>> 1. Try mini-ranks; >>> 2. Try the patch; >>> >>> >>> p.s.: I should resume the patchwork, branch merging once I am done >>> implementing >>> the reviewers' concerns for my Debian package and Fedora package for Ray. >>> >>> p.s.2: For your information, our paper about Ray Meta should appear >>> somewhere in the >>> near future, it is in re-review (the reviewers are assessing our revised >>> manuscript). >>> >>>> Louis >>>> >>>> ------------------------------------------------------------------------------ >>>> LogMeIn Central: Instant, anywhere, Remote PC access and management. >>>> Stay in control, update software, and manage PCs from one command center >>>> Diagnose problems and improve visibility into emerging IT issues >>>> Automate, monitor and manage. Do more in less time with Central >>>> http://p.sf.net/sfu/logmein12331_d2d >>>> _______________________________________________ >>>> Denovoassembler-users mailing list >>>> Denovoassembler-users@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >>>> >>> >>> >> >> ------------------------------------------------------------------------------ >> LogMeIn Central: Instant, anywhere, Remote PC access and management. >> Stay in control, update software, and manage PCs from one command center >> Diagnose problems and improve visibility into emerging IT issues >> Automate, monitor and manage. Do more in less time with Central >> http://p.sf.net/sfu/logmein12331_d2d >> _______________________________________________ >> Denovoassembler-users mailing list >> Denovoassembler-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users >> > > ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users