Hi Betina,

We also use Apollo for significant manual curation here at TAIR, and have run into performance issues when trying to load large amounts of protein alignments as results. I don't think this is so much Apollo's fault as the intrinsic nature of the problem of searching and loading large amounts of range based (i.e. chromosome mapped) information from a database. We've always used our own custom built data adapter to a proprietary database schema we designed, and this worked ok for gene annotations and cdna/est alignments. However, when we added protein blast alignments into the mix, we took a serious performance hit on load times. Our solution so far was to rewrite our dataadapter for results (genes stayed the same) to load from binary flat files coded as interval trees using a piece of software called iit_get which comes with Thomas Wu's Gmap alignment software (which is open source I believe). Interval trees are a provably efficient approach for handling large amounts of coordinate based data, but they're not trivial to implement.

Removing the relational database (in our case MySQL) for the results and replacing it with binary encoded interval tree storing the alignments as flat files addressed the speed issue, at least to a degree that our curators think they can work efficiently.

Regrettably this is a significant departure from supported Apollo use cases and requires some serious setup effort, so it's probably of little help to you, unless you have some good IT help.

However, if you are interested in our "hack" we can definitely share the details with you and/or your IT staff.

Chris Wilks
TAIR





Betina Porcel wrote:

Hi there!
Today, I'm have some doubts concerning Apollo's performance.
Here my problem:
still using Apollo1.10.0 connected to a Chado DB

I've been testing the time Apollo needs to open a genomic region, and here the result:

when trying on a region with no much stuff (some repeats and abInitio results, always on the results side), Apollo is taking more less 30 seconds to open a 5kb region. But! when trying on a region where all the resources are present (results of proteins alignements, ESTs, and mostly 454 reads!) opening 10kb is taking me 5 minutes.

I know that the more results you have in the database, the longer the time Apollo takers to open your genomic region, is the performance I'm getting "normal"? What about your experiences? We'll be starting a manual curation projet soon, and I'm pretty worried about the time that'll take for our colleaugues to open Apollo!!!!


Any suggesstions?


Thanks a lot!!!!

Betina

_______________________________________________
apollo mailing list
[email protected]
http://mail.fruitfly.org/mailman/listinfo/apollo

_______________________________________________
apollo mailing list
[email protected]
http://mail.fruitfly.org/mailman/listinfo/apollo

Reply via email to