RE: Slow query using string operator

Jerry Schwartz Wed, 11 Aug 2010 06:39:17 -0700

Yowsers! I expected that eliminating half of the string manipulation would 
help, but I never imagined that the difference would be so great.


The SELECT now runs in well under a second.
=========================

SELECT pub_id FROM pub WHERE pub_code = 'DC' INTO @PUBID;

DROP TEMPORARY TABLE IF EXISTS feed_new;

CREATE TEMPORARY TABLE feed_new (
        new_title VARCHAR(255), PRIMARY KEY (new_title),
        new_title_truncated VARCHAR(255), INDEX (new_title_truncated)
);

#       The next line loads up dummy data, but it "looks like"
#       what I'm really using.

INSERT IGNORE INTO feed_new
        SELECT prod.prod_title, LEFT(prod.prod_title, LENGTH(prod.prod_title) - 
5)
        FROM prod
        WHERE prod.prod_discont = 0
                AND prod.pub_id = @PUBID
                AND RAND() < .01;

SELECT
        feed_new.new_title AS `New Title FROM Feed`,
        prod.prod_pub_prod_id AS `Lib Code FROM DB`,
        prod.prod_title AS `Title FROM DB`,
        prod.prod_num AS `Prod Num`,
        prod.prod_published AS `Published FROM DB`
FROM feed_new JOIN prod
ON feed_new.new_title_truncated = LEFT(prod.prod_title, 
LENGTH(prod.prod_title) - 5)
WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
ORDER BY feed_new.new_title;
117 rows in set (0.25 sec)

EXPLAIN
        SELECT
                feed_new.new_title AS `New Title FROM Feed`,
                prod.prod_pub_prod_id AS `Lib Code FROM DB`,
                prod.prod_title AS `Title FROM DB`,
                prod.prod_num AS `Prod Num`,
                prod.prod_published AS `Published FROM DB`
        FROM feed_new JOIN prod
        ON feed_new.new_title_truncated = LEFT(prod.prod_title, 
LENGTH(prod.prod_title) - 5)
        WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
        ORDER BY feed_new.new_title\G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: prod
         type: ref
possible_keys: pub_id,pub_id_2
          key: pub_id
      key_len: 48
          ref: const
         rows: 11041
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: feed_new
         type: ref
possible_keys: new_title_truncated
          key: new_title_truncated
      key_len: 768
          ref: func
         rows: 1
        Extra: Using where; Using index
======================

It look like the optimizer flipped the JOIN around so that it could use the 
key in feed_new.

Thanks for your help, all.


Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: je...@gii.co.jp
Web site: www.the-infoshop.com


>-----Original Message-----
>From: Travis Ard [mailto:travis_...@hotmail.com]
>Sent: Tuesday, August 10, 2010 6:53 PM
>To: 'Jerry Schwartz'; mysql@lists.mysql.com
>Subject: RE: Slow query using string operator
>
>Can you create a second, indexed column in your feed_new temp table that
>includes the title without the year appended?  That might allow you to get
>by with a single pass through the larger prod table and avoid reading rows
>from the feed_new table.
>
>-Travis
>
>-----Original Message-----
>From: Jerry Schwartz [mailto:je...@gii.co.jp]
>Sent: Tuesday, August 10, 2010 3:39 PM
>To: mysql@lists.mysql.com
>Subject: Slow query using string operator
>
>I'm running a set of queries that look like this:
>
>===============
>SET @PUBID = (SELECT pub_id FROM pub WHERE pub_code = 'DC');
>
>DROP TEMPORARY TABLE IF EXISTS feed_new;
>
>CREATE TEMPORARY TABLE feed_new (
>       new_title VARCHAR(255), INDEX (new_title)
>       );
>
>INSERT INTO feed_new
>VALUES
>
>('UK Investment Bonds 2010'),
>('UK Protection 2010'),
>('UK Personal Insurance Distribution 2010'),
>('UK Private Medical Insurance 2010'),
>...
>('UK Private Motor Insurance 2010'),
>('Wealth Management for Non-Resident Indians 2010'),
>('Middle Eastern Cards Database 2010')
>;
>
>SELECT
>       feed_new.new_title AS `New Title FROM Feed`,
>       prod.prod_pub_prod_id AS `Lib Code FROM DB`,
>       prod.prod_title AS `Title FROM DB`,
>       prod.prod_num AS `Prod Num`,
>       prod.prod_published AS `Published FROM DB`
>FROM feed_new JOIN prod
>ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) =
>LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
>WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
>ORDER BY feed_new.new_title;
>
>============
>
>With a relatively small number of rows in `feed_new`, this can take many
>seconds. With 163 rows in `feed_new`, compared against 11234 eligible rows
>in
>prod, it took about 28 seconds. Here's what an EXPLAIN looks like:
>
>*************************** 1. row ***************************
>           id: 1
>  select_type: SIMPLE
>        table: feed_new
>         type: index
>possible_keys: NULL
>          key: PRIMARY
>      key_len: 767
>          ref: NULL
>         rows: 1
>        Extra: Using index
>*************************** 2. row ***************************
>           id: 1
>  select_type: SIMPLE
>        table: prod
>         type: ref
>possible_keys: pub_id,pub_id_2
>          key: pub_id
>      key_len: 48
>          ref: const
>         rows: 11040
>        Extra: Using where
>
>=========
>
>prod.pub_id is an indexed VARCHAR(15).
>
>If I remove the string functions, I don't get what I want -- but the
>remaining
>query runs in .05 seconds. Here's an EXPLAIN of that one:
>
>===============
>us-gii >EXPLAIN
>    -> SELECT
>    ->  feed_new.new_title AS `New Title FROM Feed`,
>    ->  prod.prod_pub_prod_id AS `Lib Code FROM DB`,
>    ->  prod.prod_title AS `Title FROM DB`,
>    ->  prod.prod_num AS `Prod Num`,
>    ->  prod.prod_published AS `Published FROM DB`
>    -> FROM feed_new JOIN prod
>    -> ON feed_new.new_title = prod.prod_title
>    -> WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
>    -> ORDER BY feed_new.new_title\G
>*************************** 1. row ***************************
>           id: 1
>  select_type: SIMPLE
>        table: feed_new
>         type: index
>possible_keys: PRIMARY
>          key: PRIMARY
>      key_len: 767
>          ref: NULL
>         rows: 163
>        Extra: Using index
>*************************** 2. row ***************************
>           id: 1
>  select_type: SIMPLE
>        table: prod
>         type: ref
>possible_keys: pub_id,prod_title,pub_id_2,prod_title_fulltext
>          key: prod_title
>      key_len: 768
>          ref: giiexpr_db.feed_new.new_title
>         rows: 1
>        Extra: Using where
>================
>
>Obviously the string manipulation is keeping MySQL from using `prod_title`
>as
>a key, but I wouldn't have thought that using `pub_id` instead would be that
>
>horrific.
>
>Does anyone have any suggestions as to how to speed this business up? I
>can't
>get away without some string manipulation, because I'm looking for "near
>matches" by ignoring the year at the end of the title.
>
>Regards,
>
>Jerry Schwartz
>Global Information Incorporated
>195 Farmington Ave.
>Farmington, CT 06032
>
>860.674.8796 / FAX: 860.674.8341
>E-mail: je...@gii.co.jp
>Web site: www.the-infoshop.com
>
>
>
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe:    http://lists.mysql.com/mysql?unsub=travis_...@hotmail.com
>
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe:    http://lists.mysql.com/mysql?unsub=je...@gii.co.jp





-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

RE: Slow query using string operator

Reply via email to