I have a query that looks like this
Select fields from
(select blah from table1)
Left outer join
(select blah from table2)
On common_field;
The sub query from table1 and for table2 return about 5000 rows each.
When I run it this way, I don’t see it ending, my beard grows white.
If I split the two subqueries and save the rows in temporary tables, then I
run the final left outer join, it takes seconds to complete.
WHY? Does that indicate poor optimisation by the query parser?
The actual queries for illustration:
/* ran for over 3 hours before I killed it */
SELECT
bm.OrderNumber bm_OrderNumber,
bm.row_count bm_row_count ,
wc.row_count wc_row_count
FROM
(
SELECT
OrderNumber,
COUNT( PartNumber ) row_count
FROM
dm_liveorders_bmcsv_orderlines bmcsvol
GROUP BY
OrderNumber
ORDER BY OrderNumber
)
bm
LEFT OUTER JOIN
(
SELECT
COALESCE( wco.ORDERS_ID, wcol.ORDERS_ID ) ORDERS_ID,
wcol.row_count
FROM
dm_liveorders_wc_orders wco
LEFT OUTER JOIN
(
SELECT
ORDERS_ID,
COUNT( ORDERITEMS_ID ) row_count
FROM
dm_liveorders_wc_orderitems
GROUP BY
ORDERS_ID
)
wcol
ON
wco.ORDERS_ID = wcol.ORDERS_ID
ORDER BY ORDERS_ID
)
wc ON
wc.ORDERS_ID = bm.OrderNumber
WHERE
wc.row_count != bm.row_count
OR wc.ORDERS_ID IS NULL;
/* the "re-factored" queries, run in under a minute */
DROP TABLE IF EXISTS bm;
CREATE MEMORY TEMP TABLE bm AS
SELECT
OrderNumber,
COUNT( PartNumber ) row_count
FROM
dm_liveorders_bmcsv_orderlines bmcsvol
GROUP BY
OrderNumber
ORDER BY OrderNumber;
DROP TABLE IF EXISTS wc;
CREATE MEMORY TEMP TABLE wc AS
SELECT
COALESCE( wco.ORDERS_ID, wcol.ORDERS_ID ) ORDERS_ID,
wcol.row_count
FROM
dm_liveorders_wc_orders wco
LEFT OUTER JOIN
(
SELECT
ORDERS_ID,
COUNT( ORDERITEMS_ID ) row_count
FROM
dm_liveorders_wc_orderitems
GROUP BY
ORDERS_ID
)
wcol
ON
wco.ORDERS_ID = wcol.ORDERS_ID
ORDER BY ORDERS_ID;
SELECT
bm.OrderNumber bm_OrderNumber,
bm.row_count bm_row_count ,
wc.row_count wc_row_count
FROM
bm
LEFT OUTER JOIN
wc ON
wc.ORDERS_ID = bm.OrderNumber
WHERE
wc.row_count != bm.row_count
OR wc.ORDERS_ID IS NULL;
Of course, I’m going to use the latter but what’s wrong with the first
query that makes it run endlessly?
The explain plan for the original query never came back. I let it run in
the background for a few minutes and cancelled it.
The explain plan for the split version is:
Part 1 ======================================
"SELECT
ORDERNUMBER,
COUNT(PARTNUMBER) AS ROW_COUNT
FROM PUBLIC.DM_LIVEORDERS_BMCSV_ORDERLINES BMCSVOL
/* PUBLIC.PRIMARY_KEY_E7 */
/* scanCount: 245447 */
GROUP BY ORDERNUMBER
ORDER BY 1
/* group sorted */
/*
total: 3366
DM_LIVEORDERS_BMCSV_ORDERLINES.PRIMARY_KEY_E7 read: 3366 (100%)
*/"
Part 2 ======================================
"SELECT
COALESCE(WCO.ORDERS_ID, WCOL.ORDERS_ID) AS ORDERS_ID,
WCOL.ROW_COUNT
FROM PUBLIC.DM_LIVEORDERS_WC_ORDERS WCO
/* PUBLIC.DM_LIVEORDERS_WC_ORDERS.tableScan */
/* scanCount: 5395 */
LEFT OUTER JOIN (
SELECT
ORDERS_ID,
COUNT(ORDERITEMS_ID) AS ROW_COUNT
FROM PUBLIC.DM_LIVEORDERS_WC_ORDERITEMS
/* PUBLIC.ORDERS_ID_INDEX */
GROUP BY ORDERS_ID
/* group sorted */
) WCOL
/* SELECT
ORDERS_ID,
COUNT(ORDERITEMS_ID) AS ROW_COUNT
FROM PUBLIC.DM_LIVEORDERS_WC_ORDERITEMS
/++ PUBLIC.ORDERS_ID_INDEX: ORDERS_ID IS ?1 ++/
WHERE ORDERS_ID IS ?1
GROUP BY ORDERS_ID
/++ group sorted ++/: ORDERS_ID = WCO.ORDERS_ID
*/
ON WCO.ORDERS_ID = WCOL.ORDERS_ID
/* scanCount: 10788 */
ORDER BY 1
/*
total: 28911
DM_LIVEORDERS_WC_ORDERITEMS.DM_LIVEORDERS_WC_ORDERITEMS_DATA read: 25803
(89%)
DM_LIVEORDERS_WC_ORDERITEMS.ORDERS_ID_INDEX read: 2562 (8%)
DM_LIVEORDERS_WC_ORDERS.DM_LIVEORDERS_WC_ORDERS_DATA read: 546 (1%)
*/"
--
You received this message because you are subscribed to the Google Groups "H2
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/groups/opt_out.