carltal opened a new issue, #17958:
URL: https://github.com/apache/druid/issues/17958
### Affected Version
29.0.1 and below (have not tested newer versions)
The Druid version where the problem was encountered.
29.0.1
### Description
With the 1st SQL query below, it creates at least twice as many inner query
results as the 2nd SQL query below, resulting in hitting the 100k
druid.server.http.maxSubqueryRows limit much earlier than the 2nd SQL query;
the only difference between them is the top layer ROW_NUMBER function.
SQL 1
`SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER (
ORDER BY
sale_revenue DESC
) AS row_num,
*
FROM
(
SELECT
SUM(sale_quantity) AS sale_quantity,
SUM(sale_total_amount) AS sale_total_amount,
SUM(sale_revenue) AS sale_revenue,
SUM(sale_revenue) AS sale_percent_of_revenue,
SUM(refund_quantity) AS refund_quantity,
SUM(refund_total_amount) AS refund_total_amount,
SUM(refund_revenue) AS refund_revenue,
SUM(refund_revenue) AS refund_percent_of_revenue,
SUM(sale_quantity) - SUM(refund_quantity) AS
netsale_quantity,
SUM(sale_revenue) - SUM(refund_revenue) AS
netsale_revenue,
SUM(sale_revenue) - SUM(refund_revenue) AS
netsale_percent_of_revenue,
FLOOR(TIME_PARSE (event_time, NULL) to MINUTE) AS
event_time_group,
tag_name
FROM
(
SELECT DISTINCT
__time,
CASE
WHEN (NVL (base_product_id, '0') = '0') THEN
variant_product_id
ELSE base_product_id
END as tag_product_id,
od_parent_id,
m1.od_id,
event_time,
tag_name,
sale_quantity,
sale_total_amount,
sale_subtotal_amount,
sale_item_cost_amount,
sale_discount_amount,
sale_taxable_revenue,
sale_non_taxable_revenue,
sale_tax_amount,
sale_revenue,
sale_gross_revenue,
sale_profitability,
refund_quantity,
refund_total_amount,
refund_subtotal_amount,
refund_item_cost_amount,
refund_discount_amount,
refund_taxable_revenue,
refund_non_taxable_revenue,
refund_tax_amount,
refund_revenue,
refund_gross_revenue,
refund_profitability
FROM
master_order_detail_primary AS m1
LEFT JOIN (
SELECT DISTINCT
od_id,
CASE
WHEN (NVL (base_product_id, '0') =
'0') THEN product_id
ELSE base_product_id
END as variant_product_id
FROM
master_order_detail_primary
WHERE
__time >= xxx
AND __time < xxx
AND event_time >= xxx
AND event_time < xxx
AND merchant_id = xxx
AND od_type = xxx
) AS m2 ON m1.od_parent_id = m2.od_id
JOIN (
SELECT
t.tag_id,
t.tag_name,
t.tag_type,
NVL (
(
CASE
WHEN (NVL (base_product_id,
'0') = '0') THEN product_id
ELSE base_product_id
END
),
t.entity_id
) as combo_product_id
FROM
tag_map t
LEFT JOIN (
SELECT distinct
category_id,
product_id,
base_product_id
FROM
master_order_detail_primary
WHERE
__time >= xxx
AND __time < xxx
AND event_time >= xxx
AND event_time < xxx
AND merchant_id = xxx
AND od_type = xxx
) p ON t.entity_id = p.category_id
WHERE
NOT (
t.tag_type = xxx
AND p.product_id IS NOT NULL
)
AND t.merchant_id = xxx
) AS tg ON CASE
WHEN (NVL (variant_product_id, '0') = '0')
THEN (
CASE
WHEN (NVL (base_product_id, '0') =
'0') THEN product_id
ELSE base_product_id
END
)
ELSE variant_product_id
END = tg.combo_product_id
WHERE
__time >= xxx
AND __time < xxx
AND event_time >= xxx
AND event_time < xxx
AND merchant_id = xxx
AND od_type = xxx
)
GROUP BY
FLOOR(TIME_PARSE (event_time, NULL) to MINUTE),
tag_name
limit
9
offset
1
)
) AS ranked
WHERE
row_num > 6
AND row_num <= 9`
SQL 2
`SELECT
SUM(sale_quantity) AS sale_quantity,
SUM(sale_total_amount) AS sale_total_amount,
SUM(sale_revenue) AS sale_revenue,
SUM(sale_revenue) AS sale_percent_of_revenue,
SUM(refund_quantity) AS refund_quantity,
SUM(refund_total_amount) AS refund_total_amount,
SUM(refund_revenue) AS refund_revenue,
SUM(refund_revenue) AS refund_percent_of_revenue,
SUM(sale_quantity) - SUM(refund_quantity) AS netsale_quantity,
SUM(sale_revenue) - SUM(refund_revenue) AS netsale_revenue,
SUM(sale_revenue) - SUM(refund_revenue) AS netsale_percent_of_revenue,
FLOOR(TIME_PARSE (event_time, NULL) to MINUTE) AS event_time_group,
tag_name
FROM
(
SELECT DISTINCT
__time,
CASE
WHEN (NVL (base_product_id, '0') = '0') THEN
variant_product_id
ELSE base_product_id
END as tag_product_id,
od_parent_id,
m1.od_id,
event_time,
tag_name,
sale_quantity,
sale_total_amount,
sale_subtotal_amount,
sale_item_cost_amount,
sale_discount_amount,
sale_taxable_revenue,
sale_non_taxable_revenue,
sale_tax_amount,
sale_revenue,
sale_gross_revenue,
sale_profitability,
refund_quantity,
refund_total_amount,
refund_subtotal_amount,
refund_item_cost_amount,
refund_discount_amount,
refund_taxable_revenue,
refund_non_taxable_revenue,
refund_tax_amount,
refund_revenue,
refund_gross_revenue,
refund_profitability
FROM
master_order_detail_primary AS m1
LEFT JOIN (
SELECT DISTINCT
od_id,
CASE
WHEN (NVL (base_product_id, '0') = '0') THEN
product_id
ELSE base_product_id
END as variant_product_id
FROM
master_order_detail_primary
WHERE
__time >= xxx
AND __time < xxx
AND event_time >= xxx
AND event_time < xxx
AND merchant_id = xxx
AND od_type = xxx
) AS m2 ON m1.od_parent_id = m2.od_id
JOIN (
SELECT
t.tag_id,
t.tag_name,
t.tag_type,
NVL (
(
CASE
WHEN (NVL (base_product_id, '0') = '0') THEN
product_id
ELSE base_product_id
END
),
t.entity_id
) as combo_product_id
FROM
tag_map t
LEFT JOIN (
SELECT distinct
category_id,
product_id,
base_product_id
FROM
master_order_detail_primary
WHERE
__time >= xxx
AND __time < xxx
AND event_time >= xxx
AND event_time < xxx
AND merchant_id = xxx
AND od_type = xxx
) p ON t.entity_id = p.category_id
WHERE
NOT (
t.tag_type = xxx
AND p.product_id IS NOT NULL
)
AND t.merchant_id = xxx
) AS tg ON CASE
WHEN (NVL (variant_product_id, '0') = '0') THEN (
CASE
WHEN (NVL (base_product_id, '0') = '0') THEN
product_id
ELSE base_product_id
END
)
ELSE variant_product_id
END = tg.combo_product_id
WHERE
__time >= xxx
AND __time < xxx
AND event_time >= xxx
AND event_time < xxx
AND merchant_id = xxx
AND od_type = xxx
)
GROUP BY
FLOOR(TIME_PARSE (event_time, NULL) to MINUTE),
tag_name`
Some background information:The layer under the ROW_NUMBER layer (the SUM
and GROUP BY layer) only has 214 results, while the layer further down would
have around 30k inner query results.
It does not make much sense to us that a single SELECT layer (with no group
by, no join, no nothing) could more than double the number of inner query
results that is 2 or more layers deeper.
Please include as much detailed information about the problem as possible.
- Cluster size: 6-8 datasources each having a million records
- Configurations in use: enabled windowing function
- The error message or stack traces encountered. Providing more context,
such as nearby log messages or even entire logs, can be helpful: resource limit
exception: 100k maxSubqueryRows reached.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]