Hello Paul,
Paul Halliday wrote:
I have 2 tables:
1) Event Data
2) Mappings
The query should return something like this:
Hits IP Country Code
20 213.136.52.29 SE
I am trying this:
SELECT COUNT(event.src_ip) AS count, INET_NTOA(event.src_ip),
mappings.cc FROM event, mappings WHERE event.timestamp BETWEEN
'2009-12-06 20:00' and '2009-12-07 20:00:00' AND event.src_ip BETWEEN
'mappings.start_ip' AND 'mappings.end_ip' GROUP BY event.src_ip ORDER
BY count DESC LIMIT 20;
Am I supposed to do a join somewhere? Do joins even apply in a where
clause? or am I totally off the mark.
Singularly, the queries look like this:
SELECT cc FROM mappings WHERE INET_ATON('src_ip') BETWEEN start_ip AND end_ip;
SELECT COUNT(src_ip) AS count, INET_NTOA(src_ip) FROM event WHERE
timestamp BETWEEN '2009-12-06 20:00' and '2009-12-07 20:00:00' GROUP
BY src_ip ORDER BY count DESC LIMIT 20;
Technically speaking, you are already doing a JOIN in your query. The
comma operator in the FROM clause combined with the WHERE conditions
make your query logically equivalent to the following rewrite:
SELECT COUNT(event.src_ip) AS count
, INET_NTOA(event.src_ip)
, mappings.cc
FROM event
INNER JOIN mappings
ON event.src_ip BETWEEN 'mappings.start_ip' AND 'mappings.end_ip'
WHERE event.timestamp BETWEEN '2009-12-06 20:00' and '2009-12-07 20:00:00'
GROUP BY event.src_ip
ORDER BY count DESC
LIMIT 20;
However, as we document in the manual, we have demoted the execution
precedence of the comma operator to form what I like to call an
"implicit join" to be evaluated AFTER any explicit JOIN clauses.
http://dev.mysql.com/doc/refman/5.0/en/join.html
What this means is that you may get better performance out of an
explicit join than you do an implicit join.
Also, you may want to consider rewriting your matching condition so that
it can use an index or a combination of indexes on your `start_ip` and
`end_ip` columns (notice my use of backticks ` ` not single quotes ' '
to identify column names) by rewriting your condition as an AND
comparison instead of a BETWEEN comparison
ON event.src_ip >= mappings.start_ip
AND event.src_ip <= mappings.end_ip
The way it is written now: "<col> BETWEEN <colA> and <colB>" is not one
of the cases that we have an optimized and generalized execution plan to
handle well. The separate conditions, like I wrote in my example, is an
optimized situation and has a much better likelihood of using an index
during its evaluation.
I would think that an index on both columns would work better than two
single-column indexes.
ALTER TABLE event ADD KEY(src_ip, end_ip);
Warmest regards,
--
Shawn Green, MySQL Senior Support Engineer
Sun Microsystems, Inc.
Office: Blountville, TN
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[email protected]