Improving the performance of joins

Rod Heyd Mon, 16 May 2005 15:05:18 -0700

I have a question about joins.  My situation is as follows:

I have 5 tables identical in structure.  Each table represents essentially
the same data, however, the data in each table represents a different
"version."  The processing involved in generating the values stored in each
table may have changed slightly, and therefore the values calculated may
vary as well.


On top of this structure is a need to identify a particular version of
a record as the "preferred" version.


The version tables look something like this:


Tables Version_(1-5)

------------------------------------------------------------------------------------------------------
ID      |PRODUCT_ID     |DATA1  |DATA2  |DATA3  |.........|DATA_N       |
------------------------------------------------------------------------------------------------------


ID                        is a unique value across all 5 version tables

PRODUCT_ID      is unique to a given version table and identifies the source 
                           "product", but is not unique in all 5 tables, ie, 
given a 
                           PRODUCT_ID=27 all 5 version tables will have a 
single record 
                           with PRODUCT_ID=27. 

DATA1...DATA_N  These are columns with calculated values.



Now in order to identify which version of a given PRODUCT_ID is "preferred,"
we have created a map table that looks similar to this:

Table Preferred_Map

-------------------------------------------------
MAP_ID  |ID     |PRODUCT_ID     |
-------------------------------------------------

MAP_ID is an auto_increment primary key

ID is the unique ID from the Version tables (unique value)

PRODUCT_ID is the PRODUCT_ID from the tables (unique value)


So to find information about records that are preferred, we have
queries that look like this:


SELECT 
        Preferred_Map.PRODUCT_ID, DATA1, DATA2, DATA3

FROM
        Preferred_Map INNER JOIN Version_1 USING(ID,PRODUCT_ID)

WHERE
        DATA1='x' AND DATA2='y' AND DATA3='z';


We run this same query for each of the 5 version tables and union them
together. However the performance is terrible compared with what we
get by running the query on a single version table without the join.  These
queries run anywhere from 6 to 10 times slower with the join than without it,
which is pretty much unacceptable for our application.


Running explain on the above query yields this:


*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: Preferred_Map
type: ALL
possible_keys: ID,PRODUCT_ID
key: NULL
key_len: NULL
ref: NULL
rows: 45191
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: Version_1
type: eq_ref
possible_keys: PRIMARY,PRODUCT_ID
key: PRIMARY
key_len: 4
ref: HiRISE_Test.Preferred_Map.ID
rows: 1
Extra: Using where


The bottle neck appears to be the full table scan on the
Preferred_Map table.  Is there a way to optimize this somehow, to
prevent the full table scan?  Or are we better off creating a
"Preferred Products" table with copies of all the records that are
preferred?  At the moment, it appears that the latter option is
better than trying to do this with a bunch of unions and joins.

Thanks for your help!

-Rod

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Improving the performance of joins

Reply via email to