On Fri, Jan 21, 2011 at 2:01 PM, Shawn Green (MySQL) < shawn.l.gr...@oracle.com> wrote:
> On 1/21/2011 14:21, Kendall Gifford wrote: > >> Hello everyone, I've got a database on an old Fedora Core 4 server running >> MySQL 4 (mysql-server.x86_64 4.1.12-2.FC4.1). The database in question has >> just two (InnoDB) tables: >> >> messages (approx 2.5 million records) >> recipients (approx 6.5 million records) >> >> These track information about email messages. Each message "has many" >> recipient records. The structure of the two tables (omitting irrelevant >> data >> fields) are as follows: >> >> >> +-------------+------------------+------+-----+---------------------+----------------+ >> | Field | Type | Null | Key | Default | >> Extra | >> >> +-------------+------------------+------+-----+---------------------+----------------+ >> | id | int(10) unsigned | | PRI | NULL | >> auto_increment | >> | sent_at | datetime | | MUL | 0000-00-00 00:00:00 >> | | >> | ......................... OTHER FIELDS OMITTED FOR BREVITY >> ....................... | >> >> +-------------+------------------+------+-----+---------------------+----------------+ >> >> >> +-------------+------------------+------+-----+---------------------+----------------+ >> | Field | Type | Null | Key | Default | >> Extra | >> >> +-------------+------------------+------+-----+---------------------+----------------+ >> | id | int(10) unsigned | | PRI | NULL | >> auto_increment | >> | message_id | int(10) unsigned | | MUL | 0 >> | | >> | employee_id | int(10) unsigned | YES | MUL | NULL >> | | >> | ......................... OTHER FIELDS OMITTED FOR BREVITY >> ....................... | >> >> +-------------+------------------+------+-----+---------------------+----------------+ >> >> I have the following query that is just too slow: >> >> SELECT messages.* FROM messages >>> INNER JOIN recipients ON recipients.message_id = messages.id >>> WHERE recipients.employee_id = X >>> GROUP BY messages.id >>> ORDER BY sent_at DESC >>> LIMIT 0, 25; >>> >> >> This takes about 44 seconds on average. The query explanation is as >> follows: >> >> >> +----+-------------+------------+--------+--------------------------+--------------+---------+-------------------------------------+--------+----------------------------------------------+ >> | id | select_type | table | type | possible_keys | >> key | key_len | ref | rows | >> Extra | >> >> +----+-------------+------------+--------+--------------------------+--------------+---------+-------------------------------------+--------+----------------------------------------------+ >> | 1 | SIMPLE | recipients | ref | messages_fk,employee_idx | >> employee_idx | 5 | const | 222640 | >> Using where; Using temporary; Using filesort | >> | 1 | SIMPLE | messages | eq_ref | PRIMARY | >> PRIMARY | 4 | email_archive.recipients.message_id | 1 >> | | >> >> +----+-------------+------------+--------+--------------------------+--------------+---------+-------------------------------------+--------+----------------------------------------------+ >> >> I've been doing some searching on the web and have no idea if/how this can >> be sped up. Most searches these days reference MySQL 5.x which I'm just >> not >> sure how much applies. I'm hoping that there is something obvious that I'm >> missing, or that one of you experts knows what I might be able to change >> to >> speed this query up. >> >> Anyhow, thanks in advance for even so much as reading my message, let >> alone >> replying :). >> >> > You need to get rid of the GROUP BY to make this go faster. You can do that > by running two queries, one to pick the list of unique recipients.message_id > values that match your where condition then another to actually retrieve the > message data. Something like this > > CREATE TEMPORARY TABLE tmpMessages (message_id INT UNSIGNED, PRIMARY KEY > (message_id)) ENGINE=MEMORY; > > INSERT IGNORE tmpMessages > SELECT message_id > FROM recipients > WHERE employee_id = X; > > SELECT messages.* FROM messages > INNER JOIN tmpMessages ON tmpMessages.message_id = messages.id > > ORDER BY sent_at DESC > LIMIT 0, 25; > > By pre-selecting a limited set of message_id values from the recipients > table, you seriously reduce the number of rows that need to be scanned. > Also, the INSERT IGNORE technique is faster than the GROUP BY because it > uses an index to identify any duplicates instead of a scan of all previous > unique values. > > Please let us all know if this is faster enough. (and don't forget to drop > the temp table once you are through using it) > > -- > Shawn Green > MySQL Principal Technical Support Engineer > Oracle USA, Inc. > Office: Blountville, TN > Thanks Shawn, I'm in the process of trying you're suggestion now. I'll let you know how it goes. -- Kendall Gifford zettab...@gmail.com