Hi David,
it isnt the processing of coordinates which takes major proportion of
the time, usually its the time taken by database to return the results.
Could you please check if indices are in place for the table that serves
the sequence coordinates ?
thanks
Syed
David M. Goodstein wrote:
On 7 Jul 2009, at 12:23, Syed Haider wrote:
Hi Rochak,
Rochak Neupane wrote:
When querying for sequences on a dataset set, without any filters (so
as to grab complete set of sequences), marts seem to be quite slow.
Pulling peptides for human, for example, took an excess of 7 minutes
from ensembl mart. Grabbing peptide sequences from Caenorhabditis
elegans (wormbase db, gene dataset from biomart.org
<http://biomart.org>) also took about 7 or so minutes for a file that
turns out to be 9MB.
Our own mart is quite slow when querying a complete set of sequences
from an organism. Is it typical for biomart to take 7-8minutes + when
querying for whole genome sequences?
a- are you using GenomicSequence to retrieve sequences ?
b- do you have ORDER BY property set on sequence exportables (if 'a'
is true). Setting ORDER BY slows down the response considerably and
its only required to cope with inconsistencies in the row order that
should really be fixed on the mart (database) construction level.
Best,
Syed
Removing the ORDER BY does improve performance (from 15 minutes down to
5 minutes for an unfiltered FASTA grab of approx 30k peptides), but
still not really something that's user-tolerable. Is that really the
expected behavior?
--David
David M. Goodstein
Joint Genome Institute / Lawrence Berkeley National Lab
Center for Integrative Genomics / UCBerkeley
http://www.phytozome.net
Thanks,
rochak