Hi,
using nio, the app performance improved well. App tested for 6599 annotated
genbank seq.
1. RichSequence.IOTools.writeGenbank(myFileOutputStream, mySeq, null): 57% of
app exec time.
2. writing mySeq -> byteArrayOutputStream -> byteBuffer -> fileChannel (code
below): 31% of exec time.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
RichSequence.IOTools.writeGenbank(baos, mySeq, null);
ByteBuffer buf = ByteBuffer.wrap(baos.toByteArray());
fileChannel.write(buf);
any suggestion on how to improve the performance (further ;-)) is welcome.
Regards,
khalil
On 28 Mar 2011, at 23:39, Andy Yates wrote:
> Dang Rich :).
>
> At the moment we've not done anything WRT Genbank outputting but would accept
> anything to help us out with this.
>
> As for the performance difference between BJ3 & BJ what happens if you use
> the writer objects directly with a BufferedOutputStream writer? Have you got
> any profiling results? It would be very interesting to see where we've lost
> the performance ...
>
> Andy
>
> On 28 Mar 2011, at 18:23, Richard Holland wrote:
>
>> In which case you've got little option but to rewrite the GenbankFormat
>> module to use NIO or other alternative methods for writing files. However
>> before you do that I suggest you investigate the recent BioJava3
>> developments to see if they've already done anything in this area - Andy
>> Yates is your man there.
>>
>> On 28 Mar 2011, at 18:11, Khalil El Mazouari wrote:
>>
>>> Sequences objects are all in-memory.
>>> I agree, 10000 seq in ± 20 sec is not bad. However, scientists will
>>> processes 100,000 seqs in each run, and IO is a real bottleneck. So, I am
>>> trying, as far as I can, to fine tune the app.
>>>
>>> Regards,
>>>
>>> khalil
>>>
>>> On 28 Mar 2011, at 18:15, Richard Holland wrote:
>>>
>>>> I would have thought 10,000 seqs written out in full Genbank format in 20
>>>> seconds was pretty good! However, the key to speeding it up would be to
>>>> modify the OutputStream interactions to use faster things such as NIO.
>>>> Also it would depend on the source of your sequence objects - if they are
>>>> all in-memory then this isn't an issue, but if they are being read from a
>>>> database using lazy or dynamic loading then that could be a bottleneck too.
>>>>
>>>>
>>>> On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am developing a sequence annotation app. It should handle ± 100.000
>>>>> sequence per run.
>>>>>
>>>>> When profiling the app (with 10.000 seq), the total execution time was ±
>>>>> 20 seconds, of which 57% was used for RichSequence.IOTools.writeGenbak!!
>>>>>
>>>>> How one could improve the RichSequence.IOTools performance?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> khalil
>>>>> _______________________________________________
>>>>> Biojava-l mailing list - [email protected]
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>> --
>>>> Richard Holland, BSc MBCS
>>>> Operations and Delivery Director, Eagle Genomics Ltd
>>>> T: +44 (0)1223 654481 ext 3 | E: [email protected]
>>>> http://www.eaglegenomics.com/
>>>>
>>>
>>
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: [email protected]
>> http://www.eaglegenomics.com/
>>
>
> --
> Andrew Yates Ensembl Genomes Engineer
> EMBL-EBI Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
>
>
>
>
_______________________________________________
Biojava-l mailing list - [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l