Dang Rich :). 

At the moment we've not done anything WRT Genbank outputting but would accept 
anything to help us out with this. 

As for the performance difference between BJ3 & BJ what happens if you use the 
writer objects directly with a BufferedOutputStream writer? Have you got any 
profiling results? It would be very interesting to see where we've lost the 
performance ...

Andy

On 28 Mar 2011, at 18:23, Richard Holland wrote:

> In which case you've got little option but to rewrite the GenbankFormat 
> module to use NIO or other alternative methods for writing files. However 
> before you do that I suggest you investigate the recent BioJava3 developments 
> to see if they've already done anything in this area - Andy Yates is your man 
> there.
> 
> On 28 Mar 2011, at 18:11, Khalil El Mazouari wrote:
> 
>> Sequences objects are all in-memory.
>> I agree, 10000 seq in ± 20 sec is not bad. However, scientists will 
>> processes 100,000 seqs in each run, and IO is a real  bottleneck. So, I am 
>> trying, as far as I can, to fine tune the app.
>> 
>> Regards,
>> 
>> khalil
>> 
>> On 28 Mar 2011, at 18:15, Richard Holland wrote:
>> 
>>> I would have thought 10,000 seqs written out in full Genbank format in 20 
>>> seconds was pretty good! However, the key to speeding it up would be to 
>>> modify the OutputStream interactions to use faster things such as NIO. Also 
>>> it would depend on the source of your sequence objects - if they are all 
>>> in-memory then this isn't an issue, but if they are being read from a 
>>> database using lazy or dynamic loading then that could be a bottleneck too.
>>> 
>>> 
>>> On 28 Mar 2011, at 17:07, Khalil El Mazouari wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am developing a sequence annotation app. It should handle ± 100.000 
>>>> sequence per run.
>>>> 
>>>> When profiling the app (with 10.000 seq), the total execution time was ± 
>>>> 20 seconds, of which 57% was used for   RichSequence.IOTools.writeGenbak!!
>>>> 
>>>> How one could improve the RichSequence.IOTools performance? 
>>>> 
>>>> Thanks.
>>>> 
>>>> khalil
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  [email protected]
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> 
>>> --
>>> Richard Holland, BSc MBCS
>>> Operations and Delivery Director, Eagle Genomics Ltd
>>> T: +44 (0)1223 654481 ext 3 | E: [email protected]
>>> http://www.eaglegenomics.com/
>>> 
>> 
> 
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: [email protected]
> http://www.eaglegenomics.com/
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/





_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to