Re: Query Optimization Question

Michael Stassen Tue, 14 Mar 2006 13:48:06 -0800

[EMAIL PROTECTED] wrote:

Yes, a ranged query should respond faster than a negation. In some casesyou can seriously improve query performance for a negation query if yousplit it into two range queries unioned together. Here is a pseudoexample:
This query should be slow due to the table scan it takes to test thecondition:
SELECT ...
FROM ...
WHERE indexfield <> 16

This query will use the index if the number of rows with indexfield != 16 issmall enough (< about 30%).

This query should be noticeably faster due to using ranged queries(partial index ranged matches):
(       SELECT ...
FROM ...
WHERE indexfield < 16
) UNION (
        SELECT ...
FROM ...
WHERE indexfield > 16
)

At the very least, you'll want to make that UNION ALL, rather than just UNION,so mysql doesn't waste time looking for duplicates to remove. But even then,this may be slower.

First, these two unioned queries may still not use the index. If the number ofrows with indexfield < 16 is too large (more than about 30%) the first querywill be a table scan. If the number of rows with indexfield > 16 is too large(more than about 30%) the second query will be a table scan. In fact, if thenumber of rows with indexfield = 16 is less than about 40% of the table, then atleast one of the two unioned queries is guaranteed to be a table scan. Worseyet, this query stands a good chance of being twice as long as the single, !=query, because it may require 2 table scans!

Second, even if both unioned queries use the index, the result still may not befaster. If the combined number of matching rows is too high, the full tablescan should be faster than the indexed lookups.


For example:

  SELECT cat, COUNT(*) FROM inits GROUP BY cat;
  +------+----------+
  | cat  | COUNT(*) |
  +------+----------+
  |    0 |     5743 |
  |    1 |     3792 |
  |    2 |    30727 |
  |    3 |     1926 |
  |    4 |     7812 |
  +------+----------+

19273 rows (38.55%) match cat != 2, with roughly half (~19%) on either side.

First, the != case:

  EXPLAIN SELECT * FROM inits WHERE cat != 2 \G
  *************************** 1. row ***************************
             id: 1
    select_type: SIMPLE
          table: inits
           type: ALL
  possible_keys: cat_idx
            key: NULL
        key_len: NULL
            ref: NULL
           rows: 50000
          Extra: Using where

As expected, a full table scan.

  SELECT * FROM inits WHERE cat != 2;
  ...
  19273 rows in set (0.37 sec)

Now, the unioned range queries:

  EXPLAIN SELECT * FROM inits WHERE cat < 2
          UNION ALL
          SELECT * FROM inits WHERE cat > 2 \G
  *************************** 1. row ***************************
             id: 1
    select_type: PRIMARY
          table: inits
           type: range
  possible_keys: cat_idx
            key: cat_idx
        key_len: 5
            ref: NULL
           rows: 5680
          Extra: Using where
  *************************** 2. row ***************************
             id: 2
    select_type: UNION
          table: inits
           type: range
  possible_keys: cat_idx
            key: cat_idx
        key_len: 5
            ref: NULL
           rows: 6543
          Extra: Using where
  *************************** 3. row ***************************
             id: NULL
    select_type: UNION RESULT
          table: <union1,2>
           type: ALL
  possible_keys: NULL
            key: NULL
        key_len: NULL
            ref: NULL
           rows: NULL
          Extra:


As hoped, mysql plans to use the index for each query.

  SELECT * FROM inits WHERE cat < 2
  UNION ALL
  SELECT * FROM inits WHERE cat > 2;
  ...
  19273 rows in set (0.78 sec)

Despite (because of) using the index, this takes more than twice as long!

Of course, if you have to do a table scan ANYWAY (because a value you havein a constraint is not in an index) then this won't help. Thisoptimization is all about getting the engine to use an index whenever itcan instead of performing a table scan. Of course, that is not alwayspossible.

Even when using an index is possible, it is not always desirable. I'd suggestnot trying to outfox the optimizer until you've first determined it is makingbad choices, and then test to make sure the solution is actually an improvement.


Michael

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Query Optimization Question

Reply via email to