Query Optimisation/Design Problem

Andreas Ahlenstorf Sun, 20 Feb 2005 08:05:21 -0800

Hi,

I have to implement a bestseller list feature into a shop software
and, unfortunately, I don't see a possibility to make it as fast as
it has to be.


The relevant stuff is distributed among four tables [1]. `orders`
contains all the ordered products. I use the following query to
create a per-category bestseller list:

SELECT SQL_NO_CACHE
     COUNT(*) AS `bought`,
     `a`.`article`
FROM
     `orders` AS `a`
LEFT JOIN
     `art2cat` AS `b`
  ON
     `a`.`article` = `b`.`article`
LEFT JOIN
     `categories` AS `c`
  ON
     `b`.`category` = `c`.`id`
WHERE
     `c`.`id` = '10'
  AND
     `a`.`date` > '2005-03-09 06:27:07'
GROUP BY
     `a`.`article`
ORDER BY
     `bought` DESC
LIMIT
     8;

Then I use two SELECT statements and a UNION to make sure that I get
always as much bestsellers as requested even if no item has been
sold (the ids for the IN clause are from `a`.`article` from the
first query).

(
    SELECT SQL_NO_CACHE
        *
    FROM
        `articles`
    WHERE
        `id` IN (328, 1667, 494, 1557, 1003, 909, 1281, 193)
)
    UNION
(
    SELECT SQL_NO_CACHE
        `a`.*
    FROM
        `articles` AS `a`
    LEFT JOIN
        `art2cat` AS `b`
      ON
        `a`.`id` = `b`.`article`
    LEFT JOIN
        `categories` AS `c`
      ON
        `b`.`category` = `c`.`id`
    WHERE
        `a`.`id` NOT IN (328, 1667, 494, 1557, 1003, 909, 1281, 193)
      AND
        `c`.`id` = '10'
    LIMIT
        2
);

The whole thing takes around 12 seconds to execute (1 mio rows in
`orders`, 10'000 rows in `articles`, 50 rows in `categories`, 10'000
rows in `art2cat`). The big problem is the first query with the
GROUP BY which causes a full table scan and has to be executed on
every request.

mysql> EXPLAIN SELECT ... \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: ref
possible_keys: id
          key: id
      key_len: 4
          ref: const
         rows: 1
        Extra: Using where; Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ALL
possible_keys: date
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1000509
        Extra: Using where
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
         type: ref
possible_keys: article
          key: article
      key_len: 4
          ref: test.a.article
         rows: 1
        Extra: Using where
3 rows in set (0.01 sec)

I already though about different approaches to avoid this.

- Use a counter in the article table so I could directly access the
most bought articles. Not possible, because I need to know which are
the bestsellers of the last week/last two weeks...
- 'Cache' the results in a temporary table (or something like that).
is senseless, 'cos I need to rebuild the bestseller list every few
requests. I can't let some people wait 15 seconds for the requested
page.
- Generate the bestseller list using some background job is
senseless too, 'cos I don't know what type of data is to be queried.

I would be very happy if somebody could give me a hint how to solve
this problem.

Regards,
A.

[1] http://sopic.com/test/tables.txt

P.S: MySQL 4.1, everything is InnoDB

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Query Optimisation/Design Problem

Reply via email to