Re: [apollo] Apollo performance

cwilks Thu, 24 Sep 2009 12:06:14 -0700

Hi Betina,

We also use Apollo for significant manual curation here at TAIR, andhave run into performance issues when trying to load large amounts ofprotein alignments as results.I don't think this is so much Apollo's fault as the intrinsic nature ofthe problem of searching and loading large amounts of range based (i.e.chromosome mapped) information from a database.We've always used our own custom built data adapter to a proprietarydatabase schema we designed, and this worked ok for gene annotations andcdna/est alignments. However, when we added protein blast alignmentsinto the mix, we took a serious performance hit on load times.Our solution so far was to rewrite our dataadapter for results (genesstayed the same) to load from binary flat files coded as interval treesusing a piece of software called iit_get which comes with Thomas Wu'sGmap alignment software (which is open source I believe). Intervaltrees are a provably efficient approach for handling large amounts ofcoordinate based data, but they're not trivial to implement.

Removing the relational database (in our case MySQL) for the results andreplacing it with binary encoded interval tree storing the alignments asflat files addressed the speed issue, at least to a degree that ourcurators think they can work efficiently.

Regrettably this is a significant departure from supported Apollo usecases and requires some serious setup effort, so it's probably of littlehelp to you, unless you have some good IT help.

However, if you are interested in our "hack" we can definitely share thedetails with you and/or your IT staff.


Chris Wilks
TAIR





Betina Porcel wrote:

Hi there!
Today, I'm have some doubts concerning Apollo's performance.
Here my problem:
still using Apollo1.10.0 connected to a Chado DB
I've been testing the time Apollo needs to open a genomic region, andhere the result:
when trying on a region with no much stuff (some repeats and abInitioresults, always on the results side), Apollo is taking more less 30seconds to open a 5kb region. But! when trying on a region where allthe resources are present (results of proteins alignements, ESTs, andmostly 454 reads!) opening 10kb is taking me 5 minutes.
I know that the more results you have in the database, the longer thetime Apollo takers to open your genomic region, is the performance I'mgetting "normal"? What about your experiences?We'll be starting a manual curation projet soon, and I'm prettyworried about the time that'll take for our colleaugues to openApollo!!!!
Any suggesstions?


Thanks a lot!!!!

Betina

_______________________________________________
apollo mailing list
[email protected]
http://mail.fruitfly.org/mailman/listinfo/apollo


_______________________________________________
apollo mailing list
[email protected]
http://mail.fruitfly.org/mailman/listinfo/apollo

Re: [apollo] Apollo performance

Reply via email to