Hi Betina,
We also use Apollo for significant manual curation here at TAIR, and
have run into performance issues when trying to load large amounts of
protein alignments as results.
I don't think this is so much Apollo's fault as the intrinsic nature of
the problem of searching and loading large amounts of range based (i.e.
chromosome mapped) information from a database.
We've always used our own custom built data adapter to a proprietary
database schema we designed, and this worked ok for gene annotations and
cdna/est alignments. However, when we added protein blast alignments
into the mix, we took a serious performance hit on load times.
Our solution so far was to rewrite our dataadapter for results (genes
stayed the same) to load from binary flat files coded as interval trees
using a piece of software called iit_get which comes with Thomas Wu's
Gmap alignment software (which is open source I believe). Interval
trees are a provably efficient approach for handling large amounts of
coordinate based data, but they're not trivial to implement.
Removing the relational database (in our case MySQL) for the results and
replacing it with binary encoded interval tree storing the alignments as
flat files addressed the speed issue, at least to a degree that our
curators think they can work efficiently.
Regrettably this is a significant departure from supported Apollo use
cases and requires some serious setup effort, so it's probably of little
help to you, unless you have some good IT help.
However, if you are interested in our "hack" we can definitely share the
details with you and/or your IT staff.
Chris Wilks
TAIR
Betina Porcel wrote:
Hi there!
Today, I'm have some doubts concerning Apollo's performance.
Here my problem:
still using Apollo1.10.0 connected to a Chado DB
I've been testing the time Apollo needs to open a genomic region, and
here the result:
when trying on a region with no much stuff (some repeats and abInitio
results, always on the results side), Apollo is taking more less 30
seconds to open a 5kb region. But! when trying on a region where all
the resources are present (results of proteins alignements, ESTs, and
mostly 454 reads!) opening 10kb is taking me 5 minutes.
I know that the more results you have in the database, the longer the
time Apollo takers to open your genomic region, is the performance I'm
getting "normal"? What about your experiences?
We'll be starting a manual curation projet soon, and I'm pretty
worried about the time that'll take for our colleaugues to open
Apollo!!!!
Any suggesstions?
Thanks a lot!!!!
Betina
_______________________________________________
apollo mailing list
[email protected]
http://mail.fruitfly.org/mailman/listinfo/apollo
_______________________________________________
apollo mailing list
[email protected]
http://mail.fruitfly.org/mailman/listinfo/apollo