Dang Rich :). At the moment we've not done anything WRT Genbank outputting but would accept anything to help us out with this.
As for the performance difference between BJ3 & BJ what happens if you use the writer objects directly with a BufferedOutputStream writer? Have you got any profiling results? It would be very interesting to see where we've lost the performance ... Andy On 28 Mar 2011, at 18:23, Richard Holland wrote: > In which case you've got little option but to rewrite the GenbankFormat > module to use NIO or other alternative methods for writing files. However > before you do that I suggest you investigate the recent BioJava3 developments > to see if they've already done anything in this area - Andy Yates is your man > there. > > On 28 Mar 2011, at 18:11, Khalil El Mazouari wrote: > >> Sequences objects are all in-memory. >> I agree, 10000 seq in ± 20 sec is not bad. However, scientists will >> processes 100,000 seqs in each run, and IO is a real bottleneck. So, I am >> trying, as far as I can, to fine tune the app. >> >> Regards, >> >> khalil >> >> On 28 Mar 2011, at 18:15, Richard Holland wrote: >> >>> I would have thought 10,000 seqs written out in full Genbank format in 20 >>> seconds was pretty good! However, the key to speeding it up would be to >>> modify the OutputStream interactions to use faster things such as NIO. Also >>> it would depend on the source of your sequence objects - if they are all >>> in-memory then this isn't an issue, but if they are being read from a >>> database using lazy or dynamic loading then that could be a bottleneck too. >>> >>> >>> On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote: >>> >>>> Hi, >>>> >>>> I am developing a sequence annotation app. It should handle ± 100.000 >>>> sequence per run. >>>> >>>> When profiling the app (with 10.000 seq), the total execution time was ± >>>> 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!! >>>> >>>> How one could improve the RichSequence.IOTools performance? >>>> >>>> Thanks. >>>> >>>> khalil >>>> _______________________________________________ >>>> Biojava-l mailing list - [email protected] >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: [email protected] >>> http://www.eaglegenomics.com/ >>> >> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: [email protected] > http://www.eaglegenomics.com/ > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
