[EMAIL PROTECTED] wrote:
Yes, a ranged query should respond faster than a negation. In some cases
you can seriously improve query performance for a negation query if you
split it into two range queries unioned together. Here is a pseudo
example:
This query should be slow due to the table scan it takes to test the
condition:
SELECT ...
FROM ...
WHERE indexfield <> 16
This query will use the index if the number of rows with indexfield != 16 is
small enough (< about 30%).
This query should be noticeably faster due to using ranged queries
(partial index ranged matches):
( SELECT ...
FROM ...
WHERE indexfield < 16
) UNION (
SELECT ...
FROM ...
WHERE indexfield > 16
)
At the very least, you'll want to make that UNION ALL, rather than just UNION,
so mysql doesn't waste time looking for duplicates to remove. But even then,
this may be slower.
First, these two unioned queries may still not use the index. If the number of
rows with indexfield < 16 is too large (more than about 30%) the first query
will be a table scan. If the number of rows with indexfield > 16 is too large
(more than about 30%) the second query will be a table scan. In fact, if the
number of rows with indexfield = 16 is less than about 40% of the table, then at
least one of the two unioned queries is guaranteed to be a table scan. Worse
yet, this query stands a good chance of being twice as long as the single, !=
query, because it may require 2 table scans!
Second, even if both unioned queries use the index, the result still may not be
faster. If the combined number of matching rows is too high, the full table
scan should be faster than the indexed lookups.
For example:
SELECT cat, COUNT(*) FROM inits GROUP BY cat;
+------+----------+
| cat | COUNT(*) |
+------+----------+
| 0 | 5743 |
| 1 | 3792 |
| 2 | 30727 |
| 3 | 1926 |
| 4 | 7812 |
+------+----------+
19273 rows (38.55%) match cat != 2, with roughly half (~19%) on either side.
First, the != case:
EXPLAIN SELECT * FROM inits WHERE cat != 2 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: inits
type: ALL
possible_keys: cat_idx
key: NULL
key_len: NULL
ref: NULL
rows: 50000
Extra: Using where
As expected, a full table scan.
SELECT * FROM inits WHERE cat != 2;
...
19273 rows in set (0.37 sec)
Now, the unioned range queries:
EXPLAIN SELECT * FROM inits WHERE cat < 2
UNION ALL
SELECT * FROM inits WHERE cat > 2 \G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: inits
type: range
possible_keys: cat_idx
key: cat_idx
key_len: 5
ref: NULL
rows: 5680
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: UNION
table: inits
type: range
possible_keys: cat_idx
key: cat_idx
key_len: 5
ref: NULL
rows: 6543
Extra: Using where
*************************** 3. row ***************************
id: NULL
select_type: UNION RESULT
table: <union1,2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra:
As hoped, mysql plans to use the index for each query.
SELECT * FROM inits WHERE cat < 2
UNION ALL
SELECT * FROM inits WHERE cat > 2;
...
19273 rows in set (0.78 sec)
Despite (because of) using the index, this takes more than twice as long!
Of course, if you have to do a table scan ANYWAY (because a value you have
in a constraint is not in an index) then this won't help. This
optimization is all about getting the engine to use an index whenever it
can instead of performing a table scan. Of course, that is not always
possible.
Even when using an index is possible, it is not always desirable. I'd suggest
not trying to outfox the optimizer until you've first determined it is making
bad choices, and then test to make sure the solution is actually an improvement.
Michael
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]