sam wun wrote:

sam wun wrote:

Hi,

Can anyone tell me how to optimize the following sql statement?

$sql_1 =
"SELECT t.prodcode 'Product Code',
        t.prodname 'Product Name',
        ROUND(avg(t.salesvolume),2) 'PastSales Quantity',
        ROUND(avg(tt.salesvolume),2) 'Sales Quantity',
        ROUND(avg(t.netsales),2) 'PastSales',
        ROUND(avg(tt.netsales),2) 'Sales',
        Round(((avg(t.netsales) - avg(tt.netsales))/avg(tt.netsales))*100,2) 
'SalesDiff',
        Round(((avg(t.salesvolume) - 
avg(tt.salesvolume))/avg(tt.salesvolume))*100,2) 'QtyDiff'
 FROM customer c
 INNER JOIN transaction t
        ON c.custcode = t.custcode
        AND date(t.date) >= '$past_date_period_startdate'
        AND date(t.date) <= '$past_date_period_enddate'
        AND c.salescode = '$salescode_param'
 INNER JOIN transaction tt
        ON c.custcode = tt.custcode
        AND date(tt.date) >= '$input_date_period_startdate'
        AND date(tt.date) <= '$input_date_period_enddate'
        AND c.salescode = '$salescode_param'
        AND t.prodcode = tt.prodcode
 INNER JOIN inventory i
        ON i.prodcode = t.prodcode
        AND i.prodcode = tt.prodcode
        AND i.basename = '$basename_param'
 WHERE i.prodname is not NULL
   AND i.basename is not NULL
 GROUP BY c.salescode, i.basename, t.prodcode
 ORDER BY SalesDiff desc;

Thanks
Sam

I see several potential problems:

1) The biggest fixable problem is the use of the date() function. You should never use a function of a column value in a WHERE clause, if you can avoid it, because that prevents the use of an index on that column to help choose rows. In this case, the date() function isn't even needed, so even if there is no index on t.date, or the optimizer doesn't choose to use it, you are still forcing 4 unnecessary function calls per row.

In other words, change

  AND date(t.date) >= '$past_date_period_startdate'
  AND date(t.date) <= '$past_date_period_enddate'

to

  AND t.date >= '$past_date_period_startdate'
  AND t.date <= '$past_date_period_enddate'

or the equivalent

  AND t.date BETWEEN '$past_date_period_startdate'
                 AND '$past_date_period_enddate'

2) In general, you should'nt put things in the JOIN condition which are not part of the JOIN criteria. For example, the condition "c.salescode = '$salescode_param'" is a restriction on which rows in c to consider, not a condition of the JOIN to t or tt, so it belongs in the WHERE clause.

3) You have a complex GROUP BY based on a column from each table, and an ORDER BY based on a calculated value. That rules out using an index to do the grouping or ordering. Hence the dreaded "Using temporary; Using filesort" in your explain output. This appears unavoidable in your case.

4) You have redundant conditions on your JOIN to i. Since t.prodcode = tt.prodcode for every row of your results (because you require this in the join of t to tt), the two conditions "i.prodcode = t.prodcode" and "i.prodcode = tt.prodcode" are equivalent. The optimizer is probably smart enough to notice that, but it will have less work to do if you pick one and delete the other.

5) Because you require "i.basename = '$basename_param'", i.basename cannot be NULL, so there is no need for the conditrion "i.basename is not NULL".

6) Are you sure you need "i.prodname is not NULL"? If you can be certain that all rows in inventory with non NULL prodcode and basename have non NULL prodnames, you could leave that out. I can't assume you don't need it based only on the rest of your query, so I'll leave it in, for now.

Based on the above, I'd suggest you change your query to:

SELECT
 t.prodcode 'Product Code',
 t.prodname 'Product Name',
 ROUND(avg(t.salesvolume),2) 'PastSales Quantity',
 ROUND(avg(tt.salesvolume),2) 'Sales Quantity',
 ROUND(avg(t.netsales),2) 'PastSales',
 ROUND(avg(tt.netsales),2) 'Sales',
 ROUND(((avg(t.netsales)-avg(tt.netsales))/avg(tt.netsales))*100,2)
   AS 'SalesDiff',
 ROUND(((avg(t.salesvolume)-avg(tt.salesvolume))/avg(tt.salesvolume))*100,2)
   AS 'QtyDiff'
 FROM customer c
 JOIN transaction t
    ON c.custcode = t.custcode
 JOIN transaction tt
    ON c.custcode = tt.custcode
    AND t.prodcode = tt.prodcode
 JOIN inventory i
    ON i.prodcode = t.prodcode
 WHERE c.salescode = '$salescode_param'
   AND t.date BETWEEN '$past_date_period_startdate'
                  AND '$past_date_period_enddate'
   AND tt.date BETWEEN '$input_date_period_startdate'
                   AND '$input_date_period_enddate'
   AND i.basename = '$basename_param'
   AND i.prodname is not NULL
 GROUP BY c.salescode, i.basename, t.prodcode
 ORDER BY SalesDiff desc;

Hi, here is the result from the explain command on the query:
<snip>
+-------+--------+-------------------+----------+---------+---------------------+------+----------------------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+--------+-------------------+----------+---------+---------------------+------+----------------------------------------------+
| i | ALL | PRIMARY,basename | NULL | NULL | NULL | 2491 | Using where; Using temporary; Using filesort |
| t | ref | custcode,prodcode | prodcode | 32 | datacube.i.prodcode | 30 | Using where |
| c | eq_ref | PRIMARY,custcode | PRIMARY | 32 | datacube.t.custcode | 1 | |
| tt | ref | custcode,prodcode | prodcode | 32 | datacube.i.prodcode | 30 | Using where |
+-------+--------+-------------------+----------+---------+---------------------+------+----------------------------------------------+

4 rows in set (0.00 sec)

sam

I spent a while trying to make sense of this before I realized that this EXPLAIN doesn't really help with your original question, as you've changed the qery. Still, I can make a couple of comments.


1) The JOIN condition for t and c includes "c.custcode = t.custcode", yet EXPLAIN says the optimizer is choosing the PRIMARY index over the custcode index in the customer table. This suggests to me that you have redundant indexes, which causes the optimizer to waste time considering useless index choices (and slows INSERTs). If you need help cleaning up your indexes, send the output of

  SHOW INDEX FROM tablename

for each of your tables.

2) Since the goal is to choose transactions for a particular customer during a specific date range, your query might benefit from a multi-column index in table transaction on (custcode, date). If you do add that multi-column index, it will also act as a single column index on custcode, so you could then drop any existing single column index on custcode (in transaction). It is also possible a multi-column index on (prodcode, date) could help, depending on your actual data.

Michael

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to