I would first try refactoring your SQL to use INNER JOIN statements
instead of the comma separated lists you are currently using. I would also
not use any subqueries. Test this and see if it works for you:
SELECT SUM(li.quantity) as qtysoldytd
FROM LineItem li
INNER JOIN Sales sa
on li.saletranID=sa.saletranID
and YEAR(sa.solddate)=2005
INNER JOIN Inventory inv
on inv.invID = li.invID
INNER JOIN Category cat
on cat.categoryid = inv.categoryid
AND cat.vendcode='AA';
The linkages work like this:
1) LineItem links into Sales through saletranID and YEAR(solddate)
2) Sales links into Inventory through the invID
3) Inventory links to Category through categoryid and vendcode
Because I used INNER JOINs, each link in the chain must exist across all
tables or the row cannot be added to the final results.
Because this query contains several joins and your table sizes are not
insignificant it becomes a candidate for what I call "piecewize
evaluation". Piecewize evaluation is where you take the full query and
build your desired results in stages. One stage that jumps out at me is
the conversion of vendcode to a list of invID values. Another stage could
be isolating just those line items for 2005. I suggest this because
JOINing two tables (either by explicit declaration as I do or by
comma-separated lists as you did) is a geometrically expensive operation
(it's cost to compute grows by multiplying how many rows are participating
from each table). If we start with two tables M and N and they each have m
and n rows in them, a JOIN operation takes on the order of m*n cycles to
compute. If we can somehow shrink each table participating in the JOIN
(by pre-selecting certain rows) so that we now have m/4 and n/2 rows to
JOIN that reduces your overall cost to (m * n)/8. When we are discussing
products of m*n on the order of 100 million rows or so, reducing
production time by a factor of 8 is noticable. The situation is even more
apparent if you add more tables.
Consider if you had tables A, B, and C and they had a,b, and c rows in
them. If you had to JOIN those three tables to build a query it would take
a*b*c units of time to complete. If we were only able to reduce each table
by 10%, that reduces the overall computation to (.9*a)*(.9*b)*(.9*c) =
.729(abc)
If:
a = 50,000
b = 500,000
c = 800,000 records
The original execution cost is proportional to:
(50000 * 500000 * 800000) = 20000000000000000 (2.0e16)
after 10% reductions through precomputations:
2.0e16 * .729 = 1.458e16
---------------------------------------------------------------------------
# of rows combinations NOT fed through the CPU
to be evaluated as being in the result or not:
2.0e16 - 1.458e16 = 5.42e+15 = 5420000000000000
How long do you think it takes even a modern computer to do
5420000000000000 tests? It can make a serious difference.
Piecewize evaluation works VERY WELL in stored procedures (if you are on
v5.0 or higher) because you can parameterize your queries quite easily and
you are assured of executing the same query pattern every time you need
it.
## stage 1 - identifying Line items from 2005
CREATE TEMPORARY TABLE tmpLI (
KEY(invID)
) SELECT li.invID, li.quantity
FROM LineItem li
INNER JOIN Sales sa
on li.saletranID=sa.saletranID
and YEAR(sa.solddate)=2005
## stage 2 - identifying Inventory Items for a certain category
CREATE TEMPORARY TABLE tmpInv (
KEY(invID)
) SELECT DISTINCT invID
FROM Inventory inv
on inv.invID = li.invID
INNER JOIN Category cat
on cat.categoryid = inv.categoryid
AND cat.vendcode='AA';
## stage 3 - compute your desired results
SELECT SUM(li.quantity)
FROM tmpLI li
INNER JOIN tmpInf inv
ON inv.invID = li.invID;
## stage 4 - the database is not your momma. Clean up after yourself...
DROP TEMPORARY TABLE tmpLi;
DROP TEMPORARY TABLE tmpInv;
## end query
I hope that helps (HTH),
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine
[EMAIL PROTECTED] wrote on 11/04/2005 12:28:50 AM:
> I cannot figure this one out. I have a Category table with 50,000
> records, an Inventory table with over 2 million records. A Sales table
> with 500,000 records. And a LineItem table with 800,000 records pairing
> the Inventory ID with the Sales Transaction ID and Quantity. I need to
> generate a Quantity sold year to date for a certain vendor. The vendor
> code can be found in the Category table which has a relationship with
> Inventory. I am trying a SQL statement like this:
>
> select sum(li.quantity) as qtysoldytd from LineItem li, Sales sa where
> (li.saletranID=sa.saletranID and YEAR(sa.solddate)=2005) and li.InvID
> IN (select invID from Inventory where categoryid IN (select categoryid
> from Category where vendcode='AA'))
>
> this yields null when I know there are sales for that vendor in 2005.
> Simplified schemas for the tables are as follows:
> Category:
>
+----------------+------------------+------+-----+---------+----------------+
> | Field | Type | Null | Key | Default | Extra
> |
>
+----------------+------------------+------+-----+---------+----------------+
> | vendcode | char(3) | YES | MUL | NULL |
> |
> | categoryID | int(10) unsigned | | PRI | NULL |
> auto_increment |
>
+----------------+------------------+------+-----+---------+----------------+
>
> Inventory:
> +--------------+---------------+------+-----+---------+-------+
> | Field | Type | Null | Key | Default | Extra |
> +--------------+---------------+------+-----+---------+-------+
> | categoryID | int(11) | YES | MUL | NULL | |
> | invID | int(10) | | PRI | 0 | |
> | itemnum | int(11) | YES | MUL | NULL | |
> +--------------+---------------+------+-----+---------+-------+
>
> Sales:
>
+--------------+------------------+------+-----+---------+----------------+
> | Field | Type | Null | Key | Default | Extra
> |
>
+--------------+------------------+------+-----+---------+----------------+
> | saletranID | int(10) unsigned | | PRI | NULL |
> auto_increment |
> | solddate | datetime | YES | | NULL |
> |
>
+--------------+------------------+------+-----+---------+----------------+
>
> LineItem:
> +------------+---------+------+-----+---------+-------+
> | Field | Type | Null | Key | Default | Extra |
> +------------+---------+------+-----+---------+-------+
> | invID | int(10) | YES | MUL | NULL | |
> | quantity | int(10) | YES | | NULL | |
> | saletranID | int(10) | YES | MUL | NULL | |
> +------------+---------+------+-----+---------+-------+
>
> Can anybody shed some light on this and if this is even possible. I have
> indexes in place and the query is still slow to pull.
> Thanks a million,
> Nathan
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
>