Help! Nasty big table efficiency issues with GROUP BY

Chris Elsworth Mon, 19 Jul 2004 09:05:45 -0700

Hello,

I wonder if someone could shed some light on a problem that's been bothering
me for months. Please bear with me as I explain it..


I have two tables in question:

CREATE TABLE `News` (
  `FileID` int(8) unsigned NOT NULL auto_increment,
  `Subject` char(200) NOT NULL default '',
  `Category` tinyint(3) unsigned NOT NULL default '0',
  `SubCategory` smallint(5) unsigned NOT NULL default '0',
  `Date` int(10) unsigned NOT NULL default '0',
  PRIMARY KEY  (`FileID`),
  KEY `S_D` (`SubCategory`,`Date`),
  KEY `C_D_P` (`Category`,`Date`,`PostID`),
  KEY `C_P_D` (`Category`,`PostID`,`Date`,
  KEY `Subject` (`Subject`(10)),
  KEY `C_D` (`Category`,`Date`),
  FULLTEXT KEY `ft_Subject` (`Subject`)
) TYPE=MyISAM

CREATE TABLE `FileGroup` (
  `FileID` int(8) unsigned NOT NULL default '0',
  `GroupID` smallint(5) unsigned NOT NULL default '0',
  PRIMARY KEY  (`GroupID`,`FileID`),
  KEY `F_G` (`FileID`,`GroupID`),
) TYPE=MyISAM


News has about 2.5m rows and FileGroup has 3.1m rows.
For each row in News, there are one or more corresponding rows in
FileGroup, determining in which newsgroups each News article appears.

So, typically:
mysql> select * from News where FileID = 26222004\G
*************************** 1. row ***************************
     FileID: 26222004
    Subject: some_usenet_post
   Category: 7
SubCategory: 706
       Date: 1090239675

mysql> select * from FileGroup where FileID = 26222004\G
*************************** 1. row ***************************
 FileID: 26222004
GroupID: 638
*************************** 2. row ***************************
 FileID: 26222004
GroupID: 799


The problem occurs when I want to get News rows that do not appear
in a specific group or set of groups. I also only want one row per News
article, not one row per group, so I have a GROUP BY (I could maybe use
DISTINCT too but they'd do pretty much the same) in there.

I end up with something like this:

SELECT * FROM News
  JOIN FileGroup ON (News.FileID = FileGroup.FileID)
  WHERE GroupID NOT IN (638) AND Category = 7
  GROUP BY News.FileID ORDER BY Date DESC 
  LIMIT 100

The resulting explain:
+-----------+------+-----------------------------+-------+---------+-------------+--------+----------------------------------------------+
| table     | type | possible_keys               | key   | key_len | ref         | 
rows   | Extra                                        |
+-----------+------+-----------------------------+-------+---------+-------------+--------+----------------------------------------------+
| News      | ref  | PRIMARY,C_D_P,C_P_D,C_D     | C_P_D |       1 | const       | 
595494 | Using where; Using temporary; Using filesort |
| FileGroup | ref  | F_G                         | F_G   |       4 | News.FileID |     
 1 | Using where                                  |
+-----------+------+-----------------------------+-------+---------+-------------+--------+----------------------------------------------+

MySQL is being forced to create a temporary table because of the GROUP BY,
and it pretty much seems to end up scanning the entire table - queries are
taking upwards of 30 seconds. In the queries that I can remove the JOIN (I
don't always need it, because I don't always need to exclude items in
certain groups) it flies, because the temporary table and filesort pretty
much always go; I can fiddle with indexes to make that always the case; the
GROUP BY can go too, since the rows will always be unique.

I've typed myself out now so I'll keep the actual question short :)
What can I do about this? Is there a more efficient way to store this data
to avoid these horrific queries?

If I can supply any more relevant information I'll be only too pleased to.

Thanks for any hints in advance.

-- 
Chris

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Help! Nasty big table efficiency issues with GROUP BY

Reply via email to