Hello Sebatien!

It seems to me the speed difference came from the Ray versions. I did not pay 
much attention on the version difference when I first noticed the speed 
difference, and had assumed the newer version is faster. 
I tested Ray-2.2.0 and Ray-2.3.1 compiled at exactly same conditions:

GNU gcc/g++ 4.8.2 
MPI standard version 2.1
MPI Library Open-MPI = 1.6.5, 
MAXKMERLENGTH=255
MPI_IO=y

Following tables are from two datasets running on a single machine (Linux box3 
3.12-1-amd64 #1 SMP Debian 3.12.9-1 (2014-02-01) x86_64 GNU/Linux, 128GB RAM). 

------------------------------------------
Ray-2.2.0 dataset1      dataset2
------------------------------------------
k17       218   313 (seconds)
k19       163   332
k21       152   326
-------------------------------------------             
------------------------------------------
Ray-2.3.1 dataset1      dataset2
------------------------------------------
k17     1224    2070
k19       907   2111
k21       823   2052
------------------------------------------
The running speed difference between the two versions are 5.5x ~ 6.5x folds.
I thought this maybe useful for your future upgrade of this great software. 

Thank you!

Yifang
________________________________________________________________________
Bioinformatics Support Specialist | Bioinformatique soutien Specialis
National Research Council of Canada | Conseil national de recherches Canada
Government of Canada | Gouvernement du Canada
110 Gymnasium Place|110, place Gymnasium
Saskatoon, Saskatchewan
S7N 0W9
Tel / Tél : 306-975-5279
Fax | Télécopieur : 306-975-4839
________________________________________
From: Tan, Yifang [yifang....@nrc-cnrc.gc.ca]
Sent: Friday, March 14, 2014 1:31 PM
To: Sébastien Boisvert
Cc: denovoassembler-users@lists.sourceforge.net
Subject: Re: [Denovoassembler-users] Ray on Redhat vs Debian

Hello Sebastian!
I am still struggling with the speed comparison between different boxes to run 
Ray, which turned out to be very slow now, but I could not figure out the 
reason.
Last week, I was trying to test the running time with different boxes, but it 
turned out very slow (~2077 seconds!!! )
box3: 2070 sec.
box4: 1986 sec.
box5:  2087 sec.

Linux box3 3.12-1-amd64 #1 SMP Debian 3.12.9-1 (2014-02-01) x86_64 GNU/Linux
Linux box4 3.10-2-amd64 #1 SMP Debian 3.10.5-1 (2013-08-07) x86_64 GNU/Linux
Linux box5 3.12-1-amd64 #1 SMP Debian 3.12.9-1 (2014-02-01) x86_64 GNU/Linux

My dataset is 716165 PE reads (mean R1 length 191bp, mean R2 length 154bp) and 
78466 single-end reads (mean length 196bp). And I tried to assemble this 
dataset with Ray-2.3.1 by:

$ mpiexec -n 20 Ray -k 17 -p S36_PE_R1.fasta S36_PE_R2.fasta -s S36_SE.fasta -o 
$OUT_ROAD/S36_k17

it took 34~35 minutes to finish. My fasted record was ~120 seconds as posted on 
February 21 for similar dataset, attached at the end of this message for your 
reference.

1) What is the average running time to assemble this dataset?  theoretical 
estimation or by your experience could be good. As I need to test different 
kmers (15~255) for ~8,000 samples (BACs), speed is a big concern to me.

2) My compilation of Ray-2.3.1 was with MAXKMERLENGTH=255, MPI_IO=y. Does these 
options impact the running speed?

3) I got error message while I was testing Ray with this dataset, which seems 
very similar to one of my old post (I could not remmeber when it was, but for 
sure it was before v2.3.1):
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Date: Fri Mar 14 15:58:34 2014
VirtualProcessor: completed jobs: 8
Rank 14 : VirtualCommunicator (service provided by VirtualCommunicator): 486319 
virtual messages generated 483451 real messages (99.4103%)
Error: can not add CCCAAGAGGCCCATGCA
last objects:
 [9049] ------> GGTGTGCCAAACATCAC
 [9050] ------> GTGTGCCAAACATCACA
 [9051] ------> TGTGCCAAACATCACAA
 [9052] ------> GTGCCAAACATCACAAC
 [9053] ------> TGCCAAACATCACAACG
 [9054] ------> GCCAAACATCACAACGT
 [9055] ------> CCAAACATCACAACGTA
 [9056] ------> CAAACATCACAACGTAA
 [9057] ------> AAACATCACAACGTAAC
 [9058] ------> AACATCACAACGTAACT
 [9059] ------> ACATCACAACGTAACTG
 [9060] ------> CATCACAACGTAACTGG
 [9061] ------> ATCACAACGTAACTGGG
 [9062] ------> TCACAACGTAACTGGGT
 [9063] ------> CACAACGTAACTGGGTG
 [9064] ------> ACAACGTAACTGGGTGA
Rank 17 JoinerTaskCreator [8/8]
Statistics: all paths: 4 eliminated during joining: 1
Rank 17: assembler memory usage: 175084 KiB
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Appreciate any suggestion and recommandation to debug those questions.

Yifang

_______________________________________________________________________
Bioinformatics Support Specialist | Bioinformatique soutien Specialis
National Research Council of Canada | Conseil national de recherches Canada
Government of Canada | Gouvernement du Canada
110 Gymnasium Place|110, place Gymnasium
Saskatoon, Saskatchewan
S7N 0W9
Tel / Tél : 306-975-5279
Fax | Télécopieur : 306-975-4839
________________________________________
From: Tan, Yifang
Sent: Friday, February 21, 2014 11:07 AM
To: Sébastien Boisvert
Subject: RE: Ray on Redhat vs Debian

Thanks!
I am aware of those factor which may be involved.
Yes, my admin said NUMAlin is involved.  My reads are stored in a mounted 
storage disk RAID. However, another recent observation is Ray ran very slow in 
the same Debian box.  This huge slowed-down speed of Ray bugged us so much, and 
my admin could not track the cause either so that I seek suggestion here.
May I ask the question in another way: Will the traffic I/O to/from the storage 
disk affect the  speed of Ray or not, and how? What is the maximum traffic load 
Ray can tolerate for data reading/writing (This may be a very silly question, 
but I am kind of desperate)?

Thank you!

Yifang
________________________________________________________________________
Bioinformatics Support Specialist | Bioinformatique soutien Specialis
National Research Council of Canada | Conseil national de recherches Canada
Government of Canada | Gouvernement du Canada
110 Gymnasium Place|110, place Gymnasium
Saskatoon, Saskatchewan
S7N 0W9
Tel / Tél : 306-975-5279
Fax | Télécopieur : 306-975-4839
________________________________________
From: Sébastien Boisvert [sebastien.boisver...@ulaval.ca]
Sent: Tuesday, February 18, 2014 9:25 AM
To: Tan, Yifang; denovoassembler-users@lists.sourceforge.net
Subject: [Denovoassembler-users] RE : Ray on Redhat vs Debian

On 17 février 2014 10:57, Tan, Yifang [yifang....@nrc-cnrc.gc.ca] wrote:
> À : Sébastien Boisvert; denovoassembler-users@lists.sourceforge.net
> Objet : Ray on Redhat vs Debian
>
> Hello Sebastien!
>
> I have a question about the speed difference of running Ray on two Linux 
> distributions:
> Redhat: Linux box1 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 
> x86_64 x86_64 x86_64 GNU/Linux
> CPU core#: 160
> RAM:  1TB (yes, 1TB)
>
> Debian: Linux box2 3.10-2-amd64 #1 SMP Debian 3.10.5-1 (2013-08-07) x86_64 
> GNU/Linux
> CPU core#: 24
> RAM: 128GB
>
> The reason of my enquiry is that my assembly on Redhat box is way much slower 
> than in Debian. Of course I used exactly the same parameters of the assembly.
> Here is a table on total assembly time (seconds) with different kmers from 
> our two boxes:
> kmer    Debian      Redhat
> 11        115          460
> 15        115          464
> 21        126          529
> 31        117          497
>
> I was thinking to use more CPU cores with more RAM would speed up the 
> assembly, which did not work as I thought.
> Sometime the difference was huge and the assembly in my Redhat was extremely 
> slow. I was wondering what may cause this difference from the operating 
> system part, so that I could ask the my sysadmin to adjust the configuration, 
> or just avoid RedHat Linux for my assembly.
>
> Thank you!


I feel like you are comparing much more than just operating systems here (Red 
Hat vs Debian).

For instance, the hardware is different (memory is different). Maybe the one 
with 1 TB RAM has a NUMAlink [1] connecting memory nodes
to CPUs. Or maybe your two machines don't even have the same CPU model.


[1] http://en.wikipedia.org/wiki/NUMAlink

>
> Yifang
>

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to