Hi list

I have 5 tables:

words (word_id int auto_increment, word varbinary(40)) (has ~3.5M
rows) with the keys:
PRIMARY KEY  (`word_id`),UNIQUE KEY `word_ind` (`word`)

phrases (phrase_id int auto_increment, phrase varbinary(100)) (has
~11M rows) with the keys:
PRIMARY KEY  (`phrase_id`),UNIQUE KEY `phrase_ind` (`phrase`)

phrase_words (phrase_id, word_id) (has ~31M rows) with:
UNIQUE KEY `phrase_ind` (`phrase_id`,`word_id`)
KEY `word` (`word_id`), KEY `phrase` (`phrase_id`)

word_searches (word_id, search_date date, search hour char(2), amount
smallint, type char(8), location char(2)) with:
KEY `word_search` (`word_id`),
KEY `id_search` (`search_date`),
KEY `word_date` (`word_id`,`search_date`)

(and a similar for phrase_searches, these two tables are merge tables
with one table for each month, each table having 15-30M rows)

phrases are built of  "words" identified by phrase_words (these are
not human language words and phrases but rather random bytes where
some are human readable).

Now, I'm trying to find out "how many times has word 1..n been
searched for and how many times has phrases containing 1..n been
searched for?"

These queries take a really long time to execute, first I select for the words:
explain sELECT w.word as word, w.word_id as word_id, sum(ws.amount) as
amount FROM words w, word_searches ws WHERE
ws.word_id=w.word_id AND w.word IN ("ppppp", "xyz", "zzz", "abc") AND
             ws.search_date >= '2006-07-17' AND ws.search_date <=
'2006-08-16' group by ws.word_id;
+----+-------------+-------+-------+---------------------------------+-------------+---------+----------------------+------+----------------------------------------------+
| id | select_type | table | type  | possible_keys                   |
key         | key_len | ref                  | rows | Extra
                           |
+----+-------------+-------+-------+---------------------------------+-------------+---------+----------------------+------+----------------------------------------------+
|  1 | SIMPLE      | w     | range | PRIMARY,word_ind                |
word_ind    | 42      | NULL                 |    4 | Using where;
Using temporary; Using filesort |
|  1 | SIMPLE      | ws    | ref   | word_search,id_search,word_date |
word_search | 4       | statistics.w.word_id |   15 | Using where
                           |
+----+-------------+-------+-------+---------------------------------+-------------+---------+----------------------+------+----------------------------------------------+

and then for phrases:
explain SELECT pw.word_id as word_id, sum(ps.amount) as amount FROM
phrase_words pw, phrase_searches ps WHERE
               pw.word_id IN (966,1,1250,1741) AND
               pw.phrase_id = ps.phrase_id AND ps.search_date >=
'2006-07-17' AND ps.search_date <= '2006-08-16'
               GROUP by pw.word_id;
+----+-------------+-------+-------+-------------------------------------+---------------+---------+-------------------------+--------+-------------+
| id | select_type | table | type  | possible_keys
 | key           | key_len | ref                     | rows   | Extra
     |
+----+-------------+-------+-------+-------------------------------------+---------------+---------+-------------------------+--------+-------------+
|  1 | SIMPLE      | pw    | range | phrase_ind,word,phrase
 | word          | 4       | NULL                    | 226847 | Using
where |
|  1 | SIMPLE      | ps    | ref   |
phrase_search,id_search,phrase_date | phrase_search | 4       |
statistics.pw.phrase_id |     15 | Using where |
+----+-------------+-------+-------+-------------------------------------+---------------+---------+-------------------------+--------+-------------+

The queries takes 40s-several minutes on a dual xeon 3GHz with 4GB ram
only running Msql.

Can someone see something I've done wrong? I have the same data in
flat files with one word and phrase on each row and one file for each
day and doing grep/sort/uniq -c in all thoose files is quicker on a
slower server with a lot of other procesess and with the files nfs
mounted.

mysqladmin status doesn't show any slow queries:
Uptime: 1215323  Threads: 2  Questions: 2191970  Slow queries: 0
Opens: 0  Flush tables: 1  Open tables: 64  Queries per second avg:
1.804

Thanks in advance
/Jon

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to