Hi all
I have this query:
SELECT n.ID, n.CatalogNumber, [...more...], d.ID, d.CatalogNumber, [...more...]
FROM newdarwincoredata n
INNER JOIN darwincoredata d ON n.CatalogNumber = d.CatalogNumber
ORDER BY n.CatalogNumber;
Both tables have exactly the same structure and indices:
mysql> SHOW CREATE TABLE darwincoredata;
CREATE TABLE `darwincoredata` (
`ID` int(10) NOT NULL auto_increment,
`CatalogNumber` varchar(20) NOT NULL,
[...more...],
PRIMARY KEY (`ID`),
UNIQUE KEY `CatalogNumber` (`CatalogNumber`),
UNIQUE KEY `GlobalUniqueIdentifier` (`GlobalUniqueIdentifier`),
KEY `DateLastModified` (`DateLastModified`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
There are 336983 rows in darwincoredata, and 337304 in newdarwincoredata.
The plan for the query varies between using the CatalogNumber index and using a
temporary table (which fails, once it fills up the disk). I get either this:
mysql> EXPLAIN <that query>
+----+-------------+-------+------+---------------+---------------+---------+------------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+-------+------+---------------+---------------+---------+------------------------------+--------+---------------------------------+
| 1 | SIMPLE | d | ALL | CatalogNumber | NULL | NULL |
NULL | 336238 | Using temporary; Using filesort |
| 1 | SIMPLE | n | ref | CatalogNumber | CatalogNumber | 22 |
GBIF_wrapper.d.CatalogNumber | 1 | |
+----+-------------+-------+------+---------------+---------------+---------+------------------------------+--------+---------------------------------+
Or this:
+----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+
| 1 | SIMPLE | d | index | CatalogNumber | CatalogNumber | 22 |
NULL | 326508 | |
| 1 | SIMPLE | n | ref | CatalogNumber | CatalogNumber | 22 |
GBIF_wrapper.d.CatalogNumber | 1 | |
+----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+
Executing
SHOW INDEX IN newdarwincoredata; SHOW INDEX IN darwincoredata;
And then doing the EXPLAIN again sometimes changes the plan. I assume this is
because the cardinalities are recalculated.
I expect the CatalogNumber unique keys to always be used for this query. Even
if I do:
SELECT [...]
FROM newdarwincoredata n FORCE INDEX (CatalogNumber)
INNER JOIN darwincoredata d FORCE INDEX (CatalogNumber) ON n.CatalogNumber =
d.CatalogNumber
ORDER BY n.CatalogNumber
The query still sometimes uses a temporary table.
EXPLAIN SELECT STRAIGHT_JOIN [...] FROM newdarwincoredata n INNER JOIN
darwincoredata d ON n.CatalogNumber = d.CatalogNumber ORDER BY n.CatalogNumber
+----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+
| 1 | SIMPLE | n | index | CatalogNumber | CatalogNumber | 22 |
NULL | 336588 | |
| 1 | SIMPLE | d | ref | CatalogNumber | CatalogNumber | 22 |
GBIF_wrapper.n.CatalogNumber | 1 | |
+----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+
This seems to always use the index, but I don't want to rely on this without
knowing why -- might it be because n has more rows than d?
MySQL version is 5.0.22-log, x86_64, redhat-linux-gnu.
Thanks for any insights
Matt Blissett