Sequences objects are all in-memory. I agree, 10000 seq in ± 20 sec is not bad. However, scientists will processes 100,000 seqs in each run, and IO is a real bottleneck. So, I am trying, as far as I can, to fine tune the app.
Regards, khalil On 28 Mar 2011, at 18:15, Richard Holland wrote: > I would have thought 10,000 seqs written out in full Genbank format in 20 > seconds was pretty good! However, the key to speeding it up would be to > modify the OutputStream interactions to use faster things such as NIO. Also > it would depend on the source of your sequence objects - if they are all > in-memory then this isn't an issue, but if they are being read from a > database using lazy or dynamic loading then that could be a bottleneck too. > > > On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote: > >> Hi, >> >> I am developing a sequence annotation app. It should handle ± 100.000 >> sequence per run. >> >> When profiling the app (with 10.000 seq), the total execution time was ± 20 >> seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! >> >> How one could improve the RichSequence.IOTools performance? >> >> Thanks. >> >> khalil >> _______________________________________________ >> Biojava-l mailing list - [email protected] >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: [email protected] > http://www.eaglegenomics.com/ > _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
