Yowsers! I expected that eliminating half of the string manipulation would
help, but I never imagined that the difference would be so great.
The SELECT now runs in well under a second.
=========================
SELECT pub_id FROM pub WHERE pub_code = 'DC' INTO @PUBID;
DROP TEMPORARY TABLE IF EXISTS feed_new;
CREATE TEMPORARY TABLE feed_new (
new_title VARCHAR(255), PRIMARY KEY (new_title),
new_title_truncated VARCHAR(255), INDEX (new_title_truncated)
);
# The next line loads up dummy data, but it "looks like"
# what I'm really using.
INSERT IGNORE INTO feed_new
SELECT prod.prod_title, LEFT(prod.prod_title, LENGTH(prod.prod_title) -
5)
FROM prod
WHERE prod.prod_discont = 0
AND prod.pub_id = @PUBID
AND RAND() < .01;
SELECT
feed_new.new_title AS `New Title FROM Feed`,
prod.prod_pub_prod_id AS `Lib Code FROM DB`,
prod.prod_title AS `Title FROM DB`,
prod.prod_num AS `Prod Num`,
prod.prod_published AS `Published FROM DB`
FROM feed_new JOIN prod
ON feed_new.new_title_truncated = LEFT(prod.prod_title,
LENGTH(prod.prod_title) - 5)
WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
ORDER BY feed_new.new_title;
117 rows in set (0.25 sec)
EXPLAIN
SELECT
feed_new.new_title AS `New Title FROM Feed`,
prod.prod_pub_prod_id AS `Lib Code FROM DB`,
prod.prod_title AS `Title FROM DB`,
prod.prod_num AS `Prod Num`,
prod.prod_published AS `Published FROM DB`
FROM feed_new JOIN prod
ON feed_new.new_title_truncated = LEFT(prod.prod_title,
LENGTH(prod.prod_title) - 5)
WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
ORDER BY feed_new.new_title\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: prod
type: ref
possible_keys: pub_id,pub_id_2
key: pub_id
key_len: 48
ref: const
rows: 11041
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: feed_new
type: ref
possible_keys: new_title_truncated
key: new_title_truncated
key_len: 768
ref: func
rows: 1
Extra: Using where; Using index
======================
It look like the optimizer flipped the JOIN around so that it could use the
key in feed_new.
Thanks for your help, all.
Regards,
Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032
860.674.8796 / FAX: 860.674.8341
E-mail: [email protected]
Web site: www.the-infoshop.com
>-----Original Message-----
>From: Travis Ard [mailto:[email protected]]
>Sent: Tuesday, August 10, 2010 6:53 PM
>To: 'Jerry Schwartz'; [email protected]
>Subject: RE: Slow query using string operator
>
>Can you create a second, indexed column in your feed_new temp table that
>includes the title without the year appended? That might allow you to get
>by with a single pass through the larger prod table and avoid reading rows
>from the feed_new table.
>
>-Travis
>
>-----Original Message-----
>From: Jerry Schwartz [mailto:[email protected]]
>Sent: Tuesday, August 10, 2010 3:39 PM
>To: [email protected]
>Subject: Slow query using string operator
>
>I'm running a set of queries that look like this:
>
>===============
>SET @PUBID = (SELECT pub_id FROM pub WHERE pub_code = 'DC');
>
>DROP TEMPORARY TABLE IF EXISTS feed_new;
>
>CREATE TEMPORARY TABLE feed_new (
> new_title VARCHAR(255), INDEX (new_title)
> );
>
>INSERT INTO feed_new
>VALUES
>
>('UK Investment Bonds 2010'),
>('UK Protection 2010'),
>('UK Personal Insurance Distribution 2010'),
>('UK Private Medical Insurance 2010'),
>...
>('UK Private Motor Insurance 2010'),
>('Wealth Management for Non-Resident Indians 2010'),
>('Middle Eastern Cards Database 2010')
>;
>
>SELECT
> feed_new.new_title AS `New Title FROM Feed`,
> prod.prod_pub_prod_id AS `Lib Code FROM DB`,
> prod.prod_title AS `Title FROM DB`,
> prod.prod_num AS `Prod Num`,
> prod.prod_published AS `Published FROM DB`
>FROM feed_new JOIN prod
>ON LEFT(feed_new.new_title, LENGTH(feed_new.new_title) - 5) =
>LEFT(prod.prod_title, LENGTH(prod.prod_title) - 5)
>WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
>ORDER BY feed_new.new_title;
>
>============
>
>With a relatively small number of rows in `feed_new`, this can take many
>seconds. With 163 rows in `feed_new`, compared against 11234 eligible rows
>in
>prod, it took about 28 seconds. Here's what an EXPLAIN looks like:
>
>*************************** 1. row ***************************
> id: 1
> select_type: SIMPLE
> table: feed_new
> type: index
>possible_keys: NULL
> key: PRIMARY
> key_len: 767
> ref: NULL
> rows: 1
> Extra: Using index
>*************************** 2. row ***************************
> id: 1
> select_type: SIMPLE
> table: prod
> type: ref
>possible_keys: pub_id,pub_id_2
> key: pub_id
> key_len: 48
> ref: const
> rows: 11040
> Extra: Using where
>
>=========
>
>prod.pub_id is an indexed VARCHAR(15).
>
>If I remove the string functions, I don't get what I want -- but the
>remaining
>query runs in .05 seconds. Here's an EXPLAIN of that one:
>
>===============
>us-gii >EXPLAIN
> -> SELECT
> -> feed_new.new_title AS `New Title FROM Feed`,
> -> prod.prod_pub_prod_id AS `Lib Code FROM DB`,
> -> prod.prod_title AS `Title FROM DB`,
> -> prod.prod_num AS `Prod Num`,
> -> prod.prod_published AS `Published FROM DB`
> -> FROM feed_new JOIN prod
> -> ON feed_new.new_title = prod.prod_title
> -> WHERE prod.pub_id = @PUBID AND prod.prod_discont = 0
> -> ORDER BY feed_new.new_title\G
>*************************** 1. row ***************************
> id: 1
> select_type: SIMPLE
> table: feed_new
> type: index
>possible_keys: PRIMARY
> key: PRIMARY
> key_len: 767
> ref: NULL
> rows: 163
> Extra: Using index
>*************************** 2. row ***************************
> id: 1
> select_type: SIMPLE
> table: prod
> type: ref
>possible_keys: pub_id,prod_title,pub_id_2,prod_title_fulltext
> key: prod_title
> key_len: 768
> ref: giiexpr_db.feed_new.new_title
> rows: 1
> Extra: Using where
>================
>
>Obviously the string manipulation is keeping MySQL from using `prod_title`
>as
>a key, but I wouldn't have thought that using `pub_id` instead would be that
>
>horrific.
>
>Does anyone have any suggestions as to how to speed this business up? I
>can't
>get away without some string manipulation, because I'm looking for "near
>matches" by ignoring the year at the end of the title.
>
>Regards,
>
>Jerry Schwartz
>Global Information Incorporated
>195 Farmington Ave.
>Farmington, CT 06032
>
>860.674.8796 / FAX: 860.674.8341
>E-mail: [email protected]
>Web site: www.the-infoshop.com
>
>
>
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe: http://lists.mysql.com/[email protected]
>
>
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe: http://lists.mysql.com/[email protected]
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[email protected]