[I] Druid Window Function Creates More Inner Query Rows (druid)

via GitHub Tue, 29 Apr 2025 08:44:07 -0700


carltal opened a new issue, #17958:
URL: https://github.com/apache/druid/issues/17958


   ### Affected Version
   29.0.1 and below (have not tested newer versions)
   
   The Druid version where the problem was encountered.
   29.0.1
   
   ### Description
   With the 1st SQL query below, it creates at least twice as many inner query 
results as the 2nd SQL query below, resulting in hitting the 100k 
druid.server.http.maxSubqueryRows limit much earlier than the 2nd SQL query; 
the only difference between them is the top layer ROW_NUMBER function. 
   
   SQL 1
   `SELECT
       *
   FROM
       (
           SELECT
               ROW_NUMBER() OVER (
                   ORDER BY
                       sale_revenue DESC
               ) AS row_num,
               *
           FROM
               (
                   SELECT
                       SUM(sale_quantity) AS sale_quantity,
                       SUM(sale_total_amount) AS sale_total_amount,
                       SUM(sale_revenue) AS sale_revenue,
                       SUM(sale_revenue) AS sale_percent_of_revenue,
                       SUM(refund_quantity) AS refund_quantity,
                       SUM(refund_total_amount) AS refund_total_amount,
                       SUM(refund_revenue) AS refund_revenue,
                       SUM(refund_revenue) AS refund_percent_of_revenue,
                       SUM(sale_quantity) - SUM(refund_quantity) AS 
netsale_quantity,
                       SUM(sale_revenue) - SUM(refund_revenue) AS 
netsale_revenue,
                       SUM(sale_revenue) - SUM(refund_revenue) AS 
netsale_percent_of_revenue,
                       FLOOR(TIME_PARSE (event_time, NULL) to MINUTE) AS 
event_time_group,
                       tag_name
                   FROM
                       (
                           SELECT DISTINCT
                               __time,
                               CASE
                                   WHEN (NVL (base_product_id, '0') = '0') THEN 
variant_product_id
                                   ELSE base_product_id
                               END as tag_product_id,
                               od_parent_id,
                               m1.od_id,
                               event_time,
                               tag_name,
                               sale_quantity,
                               sale_total_amount,
                               sale_subtotal_amount,
                               sale_item_cost_amount,
                               sale_discount_amount,
                               sale_taxable_revenue,
                               sale_non_taxable_revenue,
                               sale_tax_amount,
                               sale_revenue,
                               sale_gross_revenue,
                               sale_profitability,
                               refund_quantity,
                               refund_total_amount,
                               refund_subtotal_amount,
                               refund_item_cost_amount,
                               refund_discount_amount,
                               refund_taxable_revenue,
                               refund_non_taxable_revenue,
                               refund_tax_amount,
                               refund_revenue,
                               refund_gross_revenue,
                               refund_profitability
                           FROM
                               master_order_detail_primary AS m1
                               LEFT JOIN (
                                   SELECT DISTINCT
                                       od_id,
                                       CASE
                                           WHEN (NVL (base_product_id, '0') = 
'0') THEN product_id
                                           ELSE base_product_id
                                       END as variant_product_id
                                   FROM
                                       master_order_detail_primary
                                   WHERE
                                       __time >= xxx
                                       AND __time < xxx
                                       AND event_time >= xxx
                                       AND event_time < xxx
                                       AND merchant_id = xxx
                                       AND od_type = xxx
                               ) AS m2 ON m1.od_parent_id = m2.od_id
                               JOIN (
                                   SELECT
                                       t.tag_id,
                                       t.tag_name,
                                       t.tag_type,
                                       NVL (
                                           (
                                               CASE
                                                   WHEN (NVL (base_product_id, 
'0') = '0') THEN product_id
                                                   ELSE base_product_id
                                               END
                                           ),
                                           t.entity_id
                                       ) as combo_product_id
                                   FROM
                                       tag_map t
                                       LEFT JOIN (
                                           SELECT distinct
                                               category_id,
                                               product_id,
                                               base_product_id
                                           FROM
                                               master_order_detail_primary
                                           WHERE
                                               __time >= xxx
                                               AND __time < xxx
                                               AND event_time >= xxx
                                               AND event_time < xxx
                                               AND merchant_id = xxx
                                               AND od_type = xxx
                                       ) p ON t.entity_id = p.category_id
                                   WHERE
                                       NOT (
                                           t.tag_type = xxx
                                           AND p.product_id IS NOT NULL
                                       )
                                       AND t.merchant_id = xxx
                               ) AS tg ON CASE
                                   WHEN (NVL (variant_product_id, '0') = '0') 
THEN (
                                       CASE
                                           WHEN (NVL (base_product_id, '0') = 
'0') THEN product_id
                                           ELSE base_product_id
                                       END
                                   )
                                   ELSE variant_product_id
                               END = tg.combo_product_id
                           WHERE
                               __time >= xxx
                               AND __time < xxx
                               AND event_time >= xxx
                               AND event_time < xxx
                               AND merchant_id = xxx
                               AND od_type = xxx
                       )
                   GROUP BY
                       FLOOR(TIME_PARSE (event_time, NULL) to MINUTE),
                       tag_name
                   limit
                       9
                   offset
                       1
               )
       ) AS ranked
   WHERE
       row_num > 6
       AND row_num <= 9`
   
   SQL 2
   `SELECT
       SUM(sale_quantity) AS sale_quantity,
       SUM(sale_total_amount) AS sale_total_amount,
       SUM(sale_revenue) AS sale_revenue,
       SUM(sale_revenue) AS sale_percent_of_revenue,
       SUM(refund_quantity) AS refund_quantity,
       SUM(refund_total_amount) AS refund_total_amount,
       SUM(refund_revenue) AS refund_revenue,
       SUM(refund_revenue) AS refund_percent_of_revenue,
       SUM(sale_quantity) - SUM(refund_quantity) AS netsale_quantity,
       SUM(sale_revenue) - SUM(refund_revenue) AS netsale_revenue,
       SUM(sale_revenue) - SUM(refund_revenue) AS netsale_percent_of_revenue,
       FLOOR(TIME_PARSE (event_time, NULL) to MINUTE) AS event_time_group,
       tag_name
   FROM
       (
           SELECT DISTINCT
               __time,
               CASE
                   WHEN (NVL (base_product_id, '0') = '0') THEN 
variant_product_id
                   ELSE base_product_id
               END as tag_product_id,
               od_parent_id,
               m1.od_id,
               event_time,
               tag_name,
               sale_quantity,
               sale_total_amount,
               sale_subtotal_amount,
               sale_item_cost_amount,
               sale_discount_amount,
               sale_taxable_revenue,
               sale_non_taxable_revenue,
               sale_tax_amount,
               sale_revenue,
               sale_gross_revenue,
               sale_profitability,
               refund_quantity,
               refund_total_amount,
               refund_subtotal_amount,
               refund_item_cost_amount,
               refund_discount_amount,
               refund_taxable_revenue,
               refund_non_taxable_revenue,
               refund_tax_amount,
               refund_revenue,
               refund_gross_revenue,
               refund_profitability
           FROM
               master_order_detail_primary AS m1
               LEFT JOIN (
                   SELECT DISTINCT
                       od_id,
                       CASE
                           WHEN (NVL (base_product_id, '0') = '0') THEN 
product_id
                           ELSE base_product_id
                       END as variant_product_id
                   FROM
                       master_order_detail_primary
                   WHERE
                       __time >= xxx
                       AND __time < xxx
                       AND event_time >= xxx
                       AND event_time < xxx
                       AND merchant_id = xxx
                       AND od_type = xxx
               ) AS m2 ON m1.od_parent_id = m2.od_id
               JOIN (
                   SELECT
                       t.tag_id,
                       t.tag_name,
                       t.tag_type,
                       NVL (
                           (
                               CASE
                                   WHEN (NVL (base_product_id, '0') = '0') THEN 
product_id
                                   ELSE base_product_id
                               END
                           ),
                           t.entity_id
                       ) as combo_product_id
                   FROM
                       tag_map t
                       LEFT JOIN (
                           SELECT distinct
                               category_id,
                               product_id,
                               base_product_id
                           FROM
                               master_order_detail_primary
                           WHERE
                               __time >= xxx
                               AND __time < xxx
                               AND event_time >= xxx
                               AND event_time < xxx
                               AND merchant_id = xxx
                               AND od_type = xxx
                       ) p ON t.entity_id = p.category_id
                   WHERE
                       NOT (
                           t.tag_type = xxx
                           AND p.product_id IS NOT NULL
                       )
                       AND t.merchant_id = xxx
               ) AS tg ON CASE
                   WHEN (NVL (variant_product_id, '0') = '0') THEN (
                       CASE
                           WHEN (NVL (base_product_id, '0') = '0') THEN 
product_id
                           ELSE base_product_id
                       END
                   )
                   ELSE variant_product_id
               END = tg.combo_product_id
           WHERE
               __time >= xxx
               AND __time < xxx
               AND event_time >= xxx
               AND event_time < xxx
               AND merchant_id = xxx
               AND od_type = xxx
       )
   GROUP BY
       FLOOR(TIME_PARSE (event_time, NULL) to MINUTE),
       tag_name`
   
   Some background information:The layer under the ROW_NUMBER layer (the SUM 
and GROUP BY layer) only has 214 results, while the layer further down would 
have around 30k inner query results.
   
   It does not make much sense to us that a single SELECT layer (with no group 
by, no join, no nothing) could more than double the number of inner query 
results that is 2 or more layers deeper.
   
   Please include as much detailed information about the problem as possible.
   - Cluster size: 6-8 datasources each having a million records
   - Configurations in use: enabled windowing function
   - The error message or stack traces encountered. Providing more context, 
such as nearby log messages or even entire logs, can be helpful: resource limit 
exception: 100k maxSubqueryRows reached.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Druid Window Function Creates More Inner Query Rows (druid)

Reply via email to