>-----Original Message-----
>From: Gavin Towey [mailto:gto...@ffn.com]
>Sent: Wednesday, May 26, 2010 7:39 PM
>To: je...@gii.co.jp; mysql@lists.mysql.com
>Subject: RE: Slow query using string functions
>
>Jerry,
>
>Are you sure this is really your explain plan for this query?  That's not at
>all what I would expect to see.
>
>Regards,
>Gavin Towey
>
[JS] I'm as sure as I can be. Here's a lot more information:

==========

SET @PUBID = (SELECT pub_id FROM pub WHERE pub_code = 'DC');

DROP TEMPORARY TABLE IF EXISTS feed_new;

CREATE TEMPORARY TABLE feed_new (
        new_title VARCHAR(255), INDEX (new_title)
        );

INSERT INTO feed_new
VALUES

('Automotive Aftermarket in France - Channel Analysis to 2014'),
('Automotive Aftermarket in Germany - Channel Analysis to 2014'),
('Automotive Aftermarket in Italy - Channel Analysis to 2014'),
... about 900 more rows

SELECT
        feed_new.new_title AS `New Title FROM Feed`,
        prod.prod_pub_prod_id AS `Lib Code FROM DB`,
        prod.prod_title AS `Title FROM DB`,
        prod.prod_num AS `Prod Num`,
        prod.prod_published AS `Published FROM DB`
FROM feed_new JOIN prod
ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) = 
LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
ORDER BY feed_new.new_title;

==========

>explain
    -> SELECT
    ->  feed_new.new_title AS `New Title FROM Feed`,
    ->  prod.prod_pub_prod_id AS `Lib Code FROM DB`,
    ->  prod.prod_title AS `Title FROM DB`,
    ->  prod.prod_num AS `Prod Num`,
    ->  prod.prod_published AS `Published FROM DB`
    -> FROM feed_new JOIN prod
    -> ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) = 
LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
    -> WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
    -> ORDER BY feed_new.new_title
    -> \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: feed_new
         type: index
possible_keys: NULL
          key: new_title
      key_len: 768
          ref: NULL
         rows: 1
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: prod
         type: ref
possible_keys: pub_id,pub_id_2
          key: pub_id
      key_len: 48
          ref: const
         rows: 9817
        Extra: Using where
2 rows in set (0.00 sec)

==========

I assume that pub_id is the best key to use to limit the rows pulled from 
prod, since it cuts it down to 9817 rows from 110000. I'm guessing that even 
though prod_title is a key (MUL), using it in the expression

  LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)

keeps it from being used as a key.

If I replace the expressions in the ON clause with simple matches, I get a 
very different result:

>explain
    -> SELECT
    ->  feed_new.new_title AS `New Title FROM Feed`,
    ->  prod.prod_pub_prod_id AS `Lib Code FROM DB`,
    ->  prod.prod_title AS `Title FROM DB`,
    ->  prod.prod_num AS `Prod Num`,
    ->  prod.prod_published AS `Published FROM DB`
    -> FROM feed_new JOIN prod
    -> ON feed_new.new_title = prod.prod_title
    -> WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
    -> ORDER BY feed_new.new_title
    -> \G
*************************** 1. row **************************
           id: 1
  select_type: SIMPLE
        table: feed_new
         type: index
possible_keys: new_title
          key: new_title
      key_len: 768
          ref: NULL
         rows: 882
        Extra: Using index
*************************** 2. row **************************
           id: 1
  select_type: SIMPLE
        table: prod
         type: ref
possible_keys: pub_id,prod_title,pub_id_2,prod_title_fulltext
          key: prod_title
      key_len: 768
          ref: giiexpr_db.feed_new.new_title
         rows: 1
        Extra: Using where
2 rows in set (0.00 sec)
=====

Of course, in this case the number of qualifying rows is vastly different; but 
I would think the optimization has to happen before the results of the SELECT 
are known. No optimizer is that prescient. :-)

I'm not really surprised by the different optimizations, I'm surprised by the 
difference in execution time. Using the function-based expressions in the ON 
clause takes 2:03.38 minutes to return 267 rows (the first time, the second 
time is a little faster); using a simple comparison with no functions takes 
.03 seconds to return 1 row.

I benchmarked the expression I'm using

   select benchmark(10000000,left("asbcdkdfsaklfjdakl;fjasdl;fjasdl;fjasd", 
length("asbcdkdfsaklfjdakl;fjasdl;fjasdl;fjasd") - 5));

and the result was 2.87 seconds. That's ten million evaluations.

So where is the time going?


Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341

www.the-infoshop.com




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to