pjain1 opened a new issue, #16728:
URL: https://github.com/apache/druid/issues/16728
For a comparison query that compares previous time range to current time
range on some metric like below
```
SELECT
(COALESCE(base."page", comparison."page")) AS "page",
(ANY_VALUE(base."added")) AS "added",
(ANY_VALUE(comparison."added")) AS "added_prev",
(ANY_VALUE(base."added" - comparison."added")) AS "added_delta"
FROM
(SELECT "page", sum(added) AS "added" FROM "wikipedia" WHERE "__time" >=
'2016-06-27T00:00:00.000Z' AND "__time" < '2016-06-27T01:00:00.000Z' GROUP BY 1
ORDER BY "added" DESC LIMIT 10) base
LEFT OUTER JOIN
(SELECT "page", sum(added) AS "added" FROM "wikipedia" WHERE "__time" >=
'2016-06-27T01:00:00.000Z' AND "__time" < '2016-06-27T02:00:00.000Z' GROUP BY
1) comparison
ON
(base."page" IS NOT DISTINCT FROM comparison."page")
GROUP BY 1
ORDER BY "added" DESC
LIMIT 10
```
Druid calculates the base(left) and comparison(right) inner queries and
joins them on `page` dimension. However if the dimension is high cardinality,
the comparison query might fail with `ResourceLimitExceededException` as it
might exceed `maxSubqueryBytes` limit. Limit cannot be pushed to the comparison
query as it might have different topNs than the base one. An inner query
selecting topn values from base time range also cannot be used in the where
clause of comparison query as if there is a `null` value then `<NULL> IN NULL`
comparison is false so `null` value in comparison query will be ignored.
Druid can however compute the base query first and push the join values into
the comparison query to limit the comparison query results. Can planner do this
optimization or any other ideas ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]