Hello Sebatien! It seems to me the speed difference came from the Ray versions. I did not pay much attention on the version difference when I first noticed the speed difference, and had assumed the newer version is faster. I tested Ray-2.2.0 and Ray-2.3.1 compiled at exactly same conditions:
GNU gcc/g++ 4.8.2 MPI standard version 2.1 MPI Library Open-MPI = 1.6.5, MAXKMERLENGTH=255 MPI_IO=y Following tables are from two datasets running on a single machine (Linux box3 3.12-1-amd64 #1 SMP Debian 3.12.9-1 (2014-02-01) x86_64 GNU/Linux, 128GB RAM). ------------------------------------------ Ray-2.2.0 dataset1 dataset2 ------------------------------------------ k17 218 313 (seconds) k19 163 332 k21 152 326 ------------------------------------------- ------------------------------------------ Ray-2.3.1 dataset1 dataset2 ------------------------------------------ k17 1224 2070 k19 907 2111 k21 823 2052 ------------------------------------------ The running speed difference between the two versions are 5.5x ~ 6.5x folds. I thought this maybe useful for your future upgrade of this great software. Thank you! Yifang ________________________________________________________________________ Bioinformatics Support Specialist | Bioinformatique soutien Specialis National Research Council of Canada | Conseil national de recherches Canada Government of Canada | Gouvernement du Canada 110 Gymnasium Place|110, place Gymnasium Saskatoon, Saskatchewan S7N 0W9 Tel / Tél : 306-975-5279 Fax | Télécopieur : 306-975-4839 ________________________________________ From: Tan, Yifang [yifang....@nrc-cnrc.gc.ca] Sent: Friday, March 14, 2014 1:31 PM To: Sébastien Boisvert Cc: denovoassembler-users@lists.sourceforge.net Subject: Re: [Denovoassembler-users] Ray on Redhat vs Debian Hello Sebastian! I am still struggling with the speed comparison between different boxes to run Ray, which turned out to be very slow now, but I could not figure out the reason. Last week, I was trying to test the running time with different boxes, but it turned out very slow (~2077 seconds!!! ) box3: 2070 sec. box4: 1986 sec. box5: 2087 sec. Linux box3 3.12-1-amd64 #1 SMP Debian 3.12.9-1 (2014-02-01) x86_64 GNU/Linux Linux box4 3.10-2-amd64 #1 SMP Debian 3.10.5-1 (2013-08-07) x86_64 GNU/Linux Linux box5 3.12-1-amd64 #1 SMP Debian 3.12.9-1 (2014-02-01) x86_64 GNU/Linux My dataset is 716165 PE reads (mean R1 length 191bp, mean R2 length 154bp) and 78466 single-end reads (mean length 196bp). And I tried to assemble this dataset with Ray-2.3.1 by: $ mpiexec -n 20 Ray -k 17 -p S36_PE_R1.fasta S36_PE_R2.fasta -s S36_SE.fasta -o $OUT_ROAD/S36_k17 it took 34~35 minutes to finish. My fasted record was ~120 seconds as posted on February 21 for similar dataset, attached at the end of this message for your reference. 1) What is the average running time to assemble this dataset? theoretical estimation or by your experience could be good. As I need to test different kmers (15~255) for ~8,000 samples (BACs), speed is a big concern to me. 2) My compilation of Ray-2.3.1 was with MAXKMERLENGTH=255, MPI_IO=y. Does these options impact the running speed? 3) I got error message while I was testing Ray with this dataset, which seems very similar to one of my old post (I could not remmeber when it was, but for sure it was before v2.3.1): ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Date: Fri Mar 14 15:58:34 2014 VirtualProcessor: completed jobs: 8 Rank 14 : VirtualCommunicator (service provided by VirtualCommunicator): 486319 virtual messages generated 483451 real messages (99.4103%) Error: can not add CCCAAGAGGCCCATGCA last objects: [9049] ------> GGTGTGCCAAACATCAC [9050] ------> GTGTGCCAAACATCACA [9051] ------> TGTGCCAAACATCACAA [9052] ------> GTGCCAAACATCACAAC [9053] ------> TGCCAAACATCACAACG [9054] ------> GCCAAACATCACAACGT [9055] ------> CCAAACATCACAACGTA [9056] ------> CAAACATCACAACGTAA [9057] ------> AAACATCACAACGTAAC [9058] ------> AACATCACAACGTAACT [9059] ------> ACATCACAACGTAACTG [9060] ------> CATCACAACGTAACTGG [9061] ------> ATCACAACGTAACTGGG [9062] ------> TCACAACGTAACTGGGT [9063] ------> CACAACGTAACTGGGTG [9064] ------> ACAACGTAACTGGGTGA Rank 17 JoinerTaskCreator [8/8] Statistics: all paths: 4 eliminated during joining: 1 Rank 17: assembler memory usage: 175084 KiB ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Appreciate any suggestion and recommandation to debug those questions. Yifang _______________________________________________________________________ Bioinformatics Support Specialist | Bioinformatique soutien Specialis National Research Council of Canada | Conseil national de recherches Canada Government of Canada | Gouvernement du Canada 110 Gymnasium Place|110, place Gymnasium Saskatoon, Saskatchewan S7N 0W9 Tel / Tél : 306-975-5279 Fax | Télécopieur : 306-975-4839 ________________________________________ From: Tan, Yifang Sent: Friday, February 21, 2014 11:07 AM To: Sébastien Boisvert Subject: RE: Ray on Redhat vs Debian Thanks! I am aware of those factor which may be involved. Yes, my admin said NUMAlin is involved. My reads are stored in a mounted storage disk RAID. However, another recent observation is Ray ran very slow in the same Debian box. This huge slowed-down speed of Ray bugged us so much, and my admin could not track the cause either so that I seek suggestion here. May I ask the question in another way: Will the traffic I/O to/from the storage disk affect the speed of Ray or not, and how? What is the maximum traffic load Ray can tolerate for data reading/writing (This may be a very silly question, but I am kind of desperate)? Thank you! Yifang ________________________________________________________________________ Bioinformatics Support Specialist | Bioinformatique soutien Specialis National Research Council of Canada | Conseil national de recherches Canada Government of Canada | Gouvernement du Canada 110 Gymnasium Place|110, place Gymnasium Saskatoon, Saskatchewan S7N 0W9 Tel / Tél : 306-975-5279 Fax | Télécopieur : 306-975-4839 ________________________________________ From: Sébastien Boisvert [sebastien.boisver...@ulaval.ca] Sent: Tuesday, February 18, 2014 9:25 AM To: Tan, Yifang; denovoassembler-users@lists.sourceforge.net Subject: [Denovoassembler-users] RE : Ray on Redhat vs Debian On 17 février 2014 10:57, Tan, Yifang [yifang....@nrc-cnrc.gc.ca] wrote: > À : Sébastien Boisvert; denovoassembler-users@lists.sourceforge.net > Objet : Ray on Redhat vs Debian > > Hello Sebastien! > > I have a question about the speed difference of running Ray on two Linux > distributions: > Redhat: Linux box1 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 > x86_64 x86_64 x86_64 GNU/Linux > CPU core#: 160 > RAM: 1TB (yes, 1TB) > > Debian: Linux box2 3.10-2-amd64 #1 SMP Debian 3.10.5-1 (2013-08-07) x86_64 > GNU/Linux > CPU core#: 24 > RAM: 128GB > > The reason of my enquiry is that my assembly on Redhat box is way much slower > than in Debian. Of course I used exactly the same parameters of the assembly. > Here is a table on total assembly time (seconds) with different kmers from > our two boxes: > kmer Debian Redhat > 11 115 460 > 15 115 464 > 21 126 529 > 31 117 497 > > I was thinking to use more CPU cores with more RAM would speed up the > assembly, which did not work as I thought. > Sometime the difference was huge and the assembly in my Redhat was extremely > slow. I was wondering what may cause this difference from the operating > system part, so that I could ask the my sysadmin to adjust the configuration, > or just avoid RedHat Linux for my assembly. > > Thank you! I feel like you are comparing much more than just operating systems here (Red Hat vs Debian). For instance, the hardware is different (memory is different). Maybe the one with 1 TB RAM has a NUMAlink [1] connecting memory nodes to CPUs. Or maybe your two machines don't even have the same CPU model. [1] http://en.wikipedia.org/wiki/NUMAlink > > Yifang > ------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ Denovoassembler-users mailing list Denovoassembler-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/denovoassembler-users