Hi, there,

I am have a hard time figuring out why a simple query is extremely slow. I would greatly appreciate if you can shed some light!

     The table is in InnoDB:

CREATE TABLE `rps_hits` (
 `gi` int(10) unsigned NOT NULL default '0',
 `cddid` int(10) unsigned NOT NULL default '0',
 `bit_score` float NOT NULL default '0',
 `evalue` double NOT NULL default '0',
 `identity` smallint(5) unsigned NOT NULL default '0',
 `query_from` smallint(5) unsigned NOT NULL default '0',
 `query_to` smallint(5) unsigned NOT NULL default '0',
 `hit_from` smallint(5) unsigned NOT NULL default '0',
 `hit_to` smallint(5) unsigned NOT NULL default '0',
 `hit_len` smallint(5) unsigned NOT NULL default '0',
 `align_len` smallint(5) unsigned NOT NULL default '0',
 `bz_alignment` blob NOT NULL,
 KEY `gi` (`gi`),
 KEY `cddid` (`cddid`),
 KEY `evalue` (`evalue`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 MAX_ROWS=1410065408 AVG_ROW_LENGTH=300

It is a big table with more than 60 million rows, the rps_hits.ibd file is 22 G. All the queries I mention below were run when no other job were running against the database.

I did a very simple query against the table: "select gi, cddid, evalue from rps_hits where cddid=3161". It took 4 minutes and 29.90 seconds to pull out only 1952 rows. Whereas another simply query on gi "select gi, cddid, evalue from rps_hits where gi=393396" pulled out 1532 rows in just 0.09 second.

      "Expalin" the above query gave:
mysql> explain select gi, cddid, evalue from rps_hits where cddid=3161\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: rps_hits
        type: ref
possible_keys: cddid
         key: cddid
     key_len: 4
         ref: const
        rows: 1376
       Extra: Using where
1 row in set (0.06 sec)

      It looks OK.

Then I did "show index from rps_hits", all the index properties for each of the 3 indexes are the same except cardinality. Column gi has a cardinality of 3084286 and cddid has 14. Though a specific "select count(distinct cddid) from rps_hits" returned 11156. Since 11156 unique cddid is less than 0.01% of the total number of rows in the table, I believed the server decided to do a full table scan (does anyone know the exact percentage number of the total counts that MySQL uses as a criteria when deciding to do a FTS?)

I then use "use index" in the query after I made sure the query and index were no longer in the cache : "select gi, cddid, evalue from rps_hits use index (cddid) where cddid=3161". It still took a long time (2 min 59.79 sec) to return the 1952 rows.

I also noticed that a simple query on evalue like "select gi, cddid, evalue from rps_hits where evalue=1.97906;" is also extremely slow even "force index" was used (5.78 sec for 56 rows).

I have not been able to figure out what went wrong. Since the index on gi worked fine, I am just wondering if the slowness is caused by the large size of the table and that the indexes on cddid and evalue were created as second index and third index respectively in "create table". However I have another huge table with blob column and with comparable size and number of rows to this rps_hits table, if I searched on the third index, it was very fast.

This problem really troubled me and I would greatly appreciate if anyone could give me a hint. Thank you in advance!

Regards,
Zhe





--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to