Re: [Biojava-l] RichSequence.IOTools performance

Khalil El Mazouari Mon, 28 Mar 2011 10:12:53 -0700

Sequences objects are all in-memory.
I agree, 10000 seq in ± 20 sec is not bad. However, scientists will processes 
100,000 seqs in each run, and IO is a real  bottleneck. So, I am trying, as far 
as I can, to fine tune the app.


Regards,

khalil

On 28 Mar 2011, at 18:15, Richard Holland wrote:

> I would have thought 10,000 seqs written out in full Genbank format in 20 
> seconds was pretty good! However, the key to speeding it up would be to 
> modify the OutputStream interactions to use faster things such as NIO. Also 
> it would depend on the source of your sequence objects - if they are all 
> in-memory then this isn't an issue, but if they are being read from a 
> database using lazy or dynamic loading then that could be a bottleneck too.
> 
> 
> On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote:
> 
>> Hi,
>> 
>> I am developing a sequence annotation app. It should handle ± 100.000 
>> sequence per run.
>> 
>> When profiling the app (with 10.000 seq), the total execution time was ± 20 
>> seconds, of which 57% was used for   RichSequence.IOTools.writeGenbak!!
>> 
>> How one could improve the RichSequence.IOTools performance? 
>> 
>> Thanks.
>> 
>> khalil
>> _______________________________________________
>> Biojava-l mailing list  -  [email protected]
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: [email protected]
> http://www.eaglegenomics.com/
> 


_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Re: [Biojava-l] RichSequence.IOTools performance

Reply via email to