Martin, Khalil In the code sample you check to see if the taxon is in a list. I suspect that operation is slower than you intend. You might try using a treeset and see if the lookup performance improves.
As for genbank parsing performance itself, I'm curious if you've tried parsing the genbank XML files and noticed any performance difference? If you're looking for something similar to GPars in Java, you might try the ThreadPoolExecutor<http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html> which manages a threadpool and queuing Runnable tasks to the threadpool. Hope this helps, Mark PS if you have Groovy code that you'd like to share, feel free to add any examples to the BioGroovy wiki<http://biogroovy.open-bio.org/wiki/Main_Page> . On Jun 17, 2011 4:16 AM, "Khalil El Mazouari" <[email protected]> wrote: Good suggestion ;) However, I am not familiar with Groovy. I'll look for something similar in Java. Regards, khalil On 17 Jun 2011, at 12:36, Martin Jones wrote: > Yes, this approach won't be much use if you are interested in the > contents of every genbank record. > > Have you thought about parsing the gb files in parallel? In my > experience, parsing genbank files scales quite nicely when done in > multiple threads. I have used the GPars library for this type of job > and it is very nice to use: > > http://gpars.codehaus.org/Parallelizer > > > M > > > > On 17 June 2011 11:33, Khalil El Mazouari <[email protected]> wrote: >> Thanks ... _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
