On 1/21/2011 14:21, Kendall Gifford wrote:
Hello everyone, I've got a database on an old Fedora Core 4 server running
MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question has
just two (InnoDB) tables:

messages (approx 2.5 million records)
recipients (approx 6.5 million records)

These track information about email messages. Each message "has many"
recipient records. The structure of the two tables (omitting irrelevant data
fields) are as follows:

+-------------+------------------+------+-----+---------------------+----------------+
| Field       | Type             | Null | Key | Default             |
Extra          |
+-------------+------------------+------+-----+---------------------+----------------+
| id          | int(10) unsigned |      | PRI | NULL                |
auto_increment |
| sent_at     | datetime         |      | MUL | 0000-00-00 00:00:00
|                |
| ......................... OTHER FIELDS OMITTED FOR BREVITY
....................... |
+-------------+------------------+------+-----+---------------------+----------------+

+-------------+------------------+------+-----+---------------------+----------------+
| Field       | Type             | Null | Key | Default             |
Extra          |
+-------------+------------------+------+-----+---------------------+----------------+
| id          | int(10) unsigned |      | PRI | NULL                |
auto_increment |
| message_id  | int(10) unsigned |      | MUL | 0
|                |
| employee_id | int(10) unsigned | YES  | MUL | NULL
|                |
| ......................... OTHER FIELDS OMITTED FOR BREVITY
....................... |
+-------------+------------------+------+-----+---------------------+----------------+

I have the following query that is just too slow:

SELECT messages.* FROM messages
INNER JOIN recipients ON recipients.message_id = messages.id
WHERE recipients.employee_id = X
GROUP BY messages.id
ORDER BY sent_at DESC
LIMIT 0, 25;

This takes about 44 seconds on average. The query explanation is as follows:

+----+-------------+------------+--------+--------------------------+--------------+---------+-------------------------------------+--------+----------------------------------------------+
| id | select_type | table      | type   | possible_keys            |
key          | key_len | ref                                 | rows   |
Extra                                        |
+----+-------------+------------+--------+--------------------------+--------------+---------+-------------------------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | recipients | ref    | messages_fk,employee_idx |
employee_idx |       5 | const                               | 222640 |
Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | messages   | eq_ref | PRIMARY                  |
PRIMARY      |       4 | email_archive.recipients.message_id |      1
|                                              |
+----+-------------+------------+--------+--------------------------+--------------+---------+-------------------------------------+--------+----------------------------------------------+

I've been doing some searching on the web and have no idea if/how this can
be sped up. Most searches these days reference MySQL 5.x which I'm just not
sure how much applies. I'm hoping that there is something obvious that I'm
missing, or that one of you experts knows what I might be able to change to
speed this query up.

Anyhow, thanks in advance for even so much as reading my message, let alone
replying :).


You need to get rid of the GROUP BY to make this go faster. You can do that by running two queries, one to pick the list of unique recipients.message_id values that match your where condition then another to actually retrieve the message data. Something like this

CREATE TEMPORARY TABLE tmpMessages (message_id INT UNSIGNED, PRIMARY KEY (message_id)) ENGINE=MEMORY;

INSERT IGNORE tmpMessages
SELECT message_id
FROM recipients
WHERE employee_id = X;

SELECT messages.* FROM messages
INNER JOIN tmpMessages ON tmpMessages.message_id = messages.id
ORDER BY sent_at DESC
LIMIT 0, 25;

By pre-selecting a limited set of message_id values from the recipients table, you seriously reduce the number of rows that need to be scanned. Also, the INSERT IGNORE technique is faster than the GROUP BY because it uses an index to identify any duplicates instead of a scan of all previous unique values.

Please let us all know if this is faster enough. (and don't forget to drop the temp table once you are through using it)

--
Shawn Green
MySQL Principal Technical Support Engineer
Oracle USA, Inc.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to