Re: [Biojava-l] RichSequence.IOTools performance

Khalil El Mazouari Tue, 29 Mar 2011 07:42:38 -0700

Hi,

using nio, the app performance improved well. App tested for 6599 annotated 
genbank seq.


1. RichSequence.IOTools.writeGenbank(myFileOutputStream, mySeq, null): 57% of 
app exec time.
2. writing mySeq -> byteArrayOutputStream -> byteBuffer -> fileChannel (code 
below): 31% of exec time.

         ByteArrayOutputStream baos = new ByteArrayOutputStream();
         RichSequence.IOTools.writeGenbank(baos, mySeq, null);
         ByteBuffer buf = ByteBuffer.wrap(baos.toByteArray());
         fileChannel.write(buf);

any suggestion on how to improve the performance (further ;-)) is welcome.

Regards,

khalil

On 28 Mar 2011, at 23:39, Andy Yates wrote:

> Dang Rich :). 
> 
> At the moment we've not done anything WRT Genbank outputting but would accept 
> anything to help us out with this. 
> 
> As for the performance difference between BJ3 & BJ what happens if you use 
> the writer objects directly with a BufferedOutputStream writer? Have you got 
> any profiling results? It would be very interesting to see where we've lost 
> the performance ...
> 
> Andy
> 
> On 28 Mar 2011, at 18:23, Richard Holland wrote:
> 
>> In which case you've got little option but to rewrite the GenbankFormat 
>> module to use NIO or other alternative methods for writing files. However 
>> before you do that I suggest you investigate the recent BioJava3 
>> developments to see if they've already done anything in this area - Andy 
>> Yates is your man there.
>> 
>> On 28 Mar 2011, at 18:11, Khalil El Mazouari wrote:
>> 
>>> Sequences objects are all in-memory.
>>> I agree, 10000 seq in ± 20 sec is not bad. However, scientists will 
>>> processes 100,000 seqs in each run, and IO is a real  bottleneck. So, I am 
>>> trying, as far as I can, to fine tune the app.
>>> 
>>> Regards,
>>> 
>>> khalil
>>> 
>>> On 28 Mar 2011, at 18:15, Richard Holland wrote:
>>> 
>>>> I would have thought 10,000 seqs written out in full Genbank format in 20 
>>>> seconds was pretty good! However, the key to speeding it up would be to 
>>>> modify the OutputStream interactions to use faster things such as NIO. 
>>>> Also it would depend on the source of your sequence objects - if they are 
>>>> all in-memory then this isn't an issue, but if they are being read from a 
>>>> database using lazy or dynamic loading then that could be a bottleneck too.
>>>> 
>>>> 
>>>> On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am developing a sequence annotation app. It should handle ± 100.000 
>>>>> sequence per run.
>>>>> 
>>>>> When profiling the app (with 10.000 seq), the total execution time was ± 
>>>>> 20 seconds, of which 57% was used for   RichSequence.IOTools.writeGenbak!!
>>>>> 
>>>>> How one could improve the RichSequence.IOTools performance? 
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> khalil
>>>>> _______________________________________________
>>>>> Biojava-l mailing list  -  [email protected]
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>> 
>>>> --
>>>> Richard Holland, BSc MBCS
>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>> T: +44 (0)1223 654481 ext 3 | E: [email protected]
>>>> http://www.eaglegenomics.com/
>>>> 
>>> 
>> 
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: [email protected]
>> http://www.eaglegenomics.com/
>> 
> 
> -- 
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 


_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Re: [Biojava-l] RichSequence.IOTools performance

Reply via email to