Jon Molin wrote:
Hi list

I have 5 tables:

words (word_id int auto_increment, word varbinary(40)) (has ~3.5M
rows) with the keys:
PRIMARY KEY  (`word_id`),UNIQUE KEY `word_ind` (`word`)

phrases (phrase_id int auto_increment, phrase varbinary(100)) (has
~11M rows) with the keys:
PRIMARY KEY  (`phrase_id`),UNIQUE KEY `phrase_ind` (`phrase`)

phrase_words (phrase_id, word_id) (has ~31M rows) with:
UNIQUE KEY `phrase_ind` (`phrase_id`,`word_id`)
KEY `word` (`word_id`), KEY `phrase` (`phrase_id`)

word_searches (word_id, search_date date, search hour char(2), amount
smallint, type char(8), location char(2)) with:
KEY `word_search` (`word_id`),
KEY `id_search` (`search_date`),
KEY `word_date` (`word_id`,`search_date`)

(and a similar for phrase_searches, these two tables are merge tables
with one table for each month, each table having 15-30M rows)

phrases are built of  "words" identified by phrase_words (these are
not human language words and phrases but rather random bytes where
some are human readable).

Now, I'm trying to find out "how many times has word 1..n been
searched for and how many times has phrases containing 1..n been
searched for?"

These queries take a really long time to execute, first I select for the words:
explain sELECT w.word as word, w.word_id as word_id, sum(ws.amount) as
amount FROM words w, word_searches ws WHERE
ws.word_id=w.word_id AND w.word IN ("ppppp", "xyz", "zzz", "abc") AND
             ws.search_date >= '2006-07-17' AND ws.search_date <=
'2006-08-16' group by ws.word_id;
+----+-------------+-------+-------+---------------------------------+-------------+---------+----------------------+------+----------------------------------------------+
| id | select_type | table | type  | possible_keys                   |
key         | key_len | ref                  | rows | Extra
                           |
+----+-------------+-------+-------+---------------------------------+-------------+---------+----------------------+------+----------------------------------------------+
|  1 | SIMPLE      | w     | range | PRIMARY,word_ind                |
word_ind    | 42      | NULL                 |    4 | Using where;
Using temporary; Using filesort |
|  1 | SIMPLE      | ws    | ref   | word_search,id_search,word_date |
word_search | 4       | statistics.w.word_id |   15 | Using where
                           |
+----+-------------+-------+-------+---------------------------------+-------------+---------+----------------------+------+----------------------------------------------+

and then for phrases:
explain SELECT pw.word_id as word_id, sum(ps.amount) as amount FROM
phrase_words pw, phrase_searches ps WHERE
               pw.word_id IN (966,1,1250,1741) AND
               pw.phrase_id = ps.phrase_id AND ps.search_date >=
'2006-07-17' AND ps.search_date <= '2006-08-16'
               GROUP by pw.word_id;
+----+-------------+-------+-------+-------------------------------------+---------------+---------+-------------------------+--------+-------------+
| id | select_type | table | type  | possible_keys
 | key           | key_len | ref                     | rows   | Extra
     |
+----+-------------+-------+-------+-------------------------------------+---------------+---------+-------------------------+--------+-------------+
|  1 | SIMPLE      | pw    | range | phrase_ind,word,phrase
 | word          | 4       | NULL                    | 226847 | Using
where |
|  1 | SIMPLE      | ps    | ref   |
phrase_search,id_search,phrase_date | phrase_search | 4       |
statistics.pw.phrase_id |     15 | Using where |
+----+-------------+-------+-------+-------------------------------------+---------------+---------+-------------------------+--------+-------------+

The problem is it's picking the "word" index which apparently is returning 226,000+ areas.

Test this:

EXPLAIN SELECT pw.word_id as word_id, sum(ps.amount) as amount FROM
phrase_words pw, phrase_searches ps IGNORE INDEX (word) WHERE
pw.word_id IN (966,1,1250,1741) AND
pw.phrase_id = ps.phrase_id AND ps.search_date >=
'2006-07-17' AND ps.search_date <= '2006-08-16'
GROUP by pw.word_id;

(that should ignore the 'word' index and instead use the 'phrase' index).

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to