Chris,

Have you checked your following server configurables:

sort_buffer_size:
 - This is the size of the cache created by _each_ thread that requires
ORDER BY or GROUP BY in a query.
If you are doing a lot of large ordered queries you will need to increase
this value otherwise MySQL will use _disk_ for sorting (this is very slow
and largely undesirable).

tmp_table_size:
 - This is the maximum size of an "in memory" or HEAP temporary table. If a
GROUP BY query causes the server to exceed this limit, an "on disk" table
will be used. Once again, this is slow and undesirable.

Tweaking some of these settings may bring you some joy..

http://dev.mysql.com/doc/mysql/en/Server_parameters.html
http://dev.mysql.com/doc/mysql/en/Server_system_variables.html


Hope this helps,
Lachlan

-----Original Message-----
From: Chris Elsworth [mailto:[EMAIL PROTECTED] Behalf Of
Chris Elsworth
Sent: Tuesday, 20 July 2004 2:00 AM
To: [EMAIL PROTECTED]
Subject: Help! Nasty big table efficiency issues with GROUP BY


Hello,

I wonder if someone could shed some light on a problem that's been bothering
me for months. Please bear with me as I explain it..

I have two tables in question:

CREATE TABLE `News` (
  `FileID` int(8) unsigned NOT NULL auto_increment,
  `Subject` char(200) NOT NULL default '',
  `Category` tinyint(3) unsigned NOT NULL default '0',
  `SubCategory` smallint(5) unsigned NOT NULL default '0',
  `Date` int(10) unsigned NOT NULL default '0',
  PRIMARY KEY  (`FileID`),
  KEY `S_D` (`SubCategory`,`Date`),
  KEY `C_D_P` (`Category`,`Date`,`PostID`),
  KEY `C_P_D` (`Category`,`PostID`,`Date`,
  KEY `Subject` (`Subject`(10)),
  KEY `C_D` (`Category`,`Date`),
  FULLTEXT KEY `ft_Subject` (`Subject`)
) TYPE=MyISAM

CREATE TABLE `FileGroup` (
  `FileID` int(8) unsigned NOT NULL default '0',
  `GroupID` smallint(5) unsigned NOT NULL default '0',
  PRIMARY KEY  (`GroupID`,`FileID`),
  KEY `F_G` (`FileID`,`GroupID`),
) TYPE=MyISAM


News has about 2.5m rows and FileGroup has 3.1m rows.
For each row in News, there are one or more corresponding rows in
FileGroup, determining in which newsgroups each News article appears.

So, typically:
mysql> select * from News where FileID = 26222004\G
*************************** 1. row ***************************
     FileID: 26222004
    Subject: some_usenet_post
   Category: 7
SubCategory: 706
       Date: 1090239675

mysql> select * from FileGroup where FileID = 26222004\G
*************************** 1. row ***************************
 FileID: 26222004
GroupID: 638
*************************** 2. row ***************************
 FileID: 26222004
GroupID: 799


The problem occurs when I want to get News rows that do not appear
in a specific group or set of groups. I also only want one row per News
article, not one row per group, so I have a GROUP BY (I could maybe use
DISTINCT too but they'd do pretty much the same) in there.

I end up with something like this:

SELECT * FROM News
  JOIN FileGroup ON (News.FileID = FileGroup.FileID)
  WHERE GroupID NOT IN (638) AND Category = 7
  GROUP BY News.FileID ORDER BY Date DESC
  LIMIT 100

The resulting explain:
+-----------+------+-----------------------------+-------+---------+--------
-----+--------+----------------------------------------------+
| table     | type | possible_keys               | key   | key_len | ref
| rows   | Extra                                        |
+-----------+------+-----------------------------+-------+---------+--------
-----+--------+----------------------------------------------+
| News      | ref  | PRIMARY,C_D_P,C_P_D,C_D     | C_P_D |       1 | const
| 595494 | Using where; Using temporary; Using filesort |
| FileGroup | ref  | F_G                         | F_G   |       4 |
News.FileID |      1 | Using where                                  |
+-----------+------+-----------------------------+-------+---------+--------
-----+--------+----------------------------------------------+

MySQL is being forced to create a temporary table because of the GROUP BY,
and it pretty much seems to end up scanning the entire table - queries are
taking upwards of 30 seconds. In the queries that I can remove the JOIN (I
don't always need it, because I don't always need to exclude items in
certain groups) it flies, because the temporary table and filesort pretty
much always go; I can fiddle with indexes to make that always the case; the
GROUP BY can go too, since the rows will always be unique.

I've typed myself out now so I'll keep the actual question short :)
What can I do about this? Is there a more efficient way to store this data
to avoid these horrific queries?

If I can supply any more relevant information I'll be only too pleased to.

Thanks for any hints in advance.

--
Chris

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to