We are very happy users of MySQL, which we run a nice big genome database
on. We have a web front end to the database (www.ensembl.org if anyone is
interested) and we are looking at getting our pages to come back faster.


I have been moving things out of Perl into C for the heavier lifting
between the database and the actual html/gif generation. Although there is
alot more milage in this, we may soon be coming up to a mysql side limit.


The main queries join two tables, both with indexes, looking like:



  sprintf(sqlbuffer,
         "SELECT IF
(sgp.raw_ori=1,(f.seq_start+sgp.chr_start-sgp.raw_start-%d),"
              "         (sgp.chr_start+sgp.raw_end-f.seq_end-%d)), "  
              "  IF
(sgp.raw_ori=1,(f.seq_end+sgp.chr_start-sgp.raw_start-%d),"

"               (sgp.chr_start+sgp.raw_end-f.seq_start-%d))," 
              "  IF     (sgp.raw_ori=1,f.strand,(-f.strand)),"
               " f.id, f.score, f.analysis,"
               " f.hstart, f.hend, f.hid"
          " FROM %s f,static_golden_path sgp"
         " WHERE sgp.raw_id = f.contig AND sgp.chr_end >= %d AND
sgp.chr_start <= %d AND sgp.chr_name = '%s'"
          " AND f.score > '%s'",
          start,start,start,start,table,start,end,chr_name,score);


The table f can vary depending on the precise query. The main join is

    sgp.raw_id = f.contig

with the main restriction being 

    sgp.chr_end >= X AND sgp.chr_start <= Y

getting an interval of rows on sgp. We have appropiate keys on the tables.

Notice the nice if statements returning stuff, which work very well and
don't seem to slow things down at all...


Cardinality is about 5,000,000 rows for f, with the f.contig being around 
400,000 unique ids (integers), whereas sgp has about 100,000 rows. The
final query will return anywhere between 2 and 150 unique f.contig ids
which get expanded to something like 20 to 5000 rows in the f table.
The query comes back between 0.1 and 1.5 seconds (depending on the precise
make up of the data), and this might soon dominate the page refresh rate.

At the moment, the page refresh for larger regions is > 40 seconds, which
is unacceptable to users, so improving this query would be a big win.

I am interested at the moment about how we go about investigating ways of
getting this to go faster. At the moment we have a rather crude profiling
tool for aspects of how fast things comes back inside the HTML generation
code and of course we can run things from the mysql client to test query
times.


Does anyone have any advice about what we should look into first about
improving turn-around



thanks



ewan



-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<[EMAIL PROTECTED]>. 
-----------------------------------------------------------------


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to