[
https://issues.apache.org/jira/browse/CALCITE-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826682#comment-15826682
]
Jesus Camacho Rodriguez commented on CALCITE-1578:
--------------------------------------------------
We might recognize _ floor\_day_ and adjust the query granularity:
https://github.com/apache/calcite/blob/master/druid/src/main/java/org/apache/calcite/adapter/druid/DruidQuery.java#L474
This was one of the changes introduced with the recognition for Druid topN
queries.
What we could do to fix this issue is to avoid pushing the SortLimit operator
if granularity is not all. That would give us the right semantics, which is the
immediate goal.
Then, we could focus on enriching the algebra in a follow-up as you proposed (I
like your proposal) so we could recognize easier this kind of queries.
> Druid adapter: wrong semantics of topN query limit with granularity
> -------------------------------------------------------------------
>
> Key: CALCITE-1578
> URL: https://issues.apache.org/jira/browse/CALCITE-1578
> Project: Calcite
> Issue Type: Bug
> Components: druid
> Affects Versions: 1.11.0
> Reporter: Jesus Camacho Rodriguez
> Assignee: Jesus Camacho Rodriguez
> Priority: Critical
>
> Semantics of Druid topN query with limit and granularity is not equivalent to
> input SQL. In particular, limit is applied on each granularity value, not on
> the overall query.
> Currently, the following query will be transformed into a topN query:
> {code:sql}
> SELECT i_brand_id, floor_day(`__time`), max(ss_quantity),
> sum(ss_wholesale_cost) as s
> FROM store_sales_sold_time_subset
> GROUP BY i_brand_id, floor_day(`__time`)
> ORDER BY s DESC
> LIMIT 10;
> {code}
> Previous query outputs at most 10 rows. In turn, the equivalent SQL query for
> a Druid topN query should be expressed as:
> {code:sql}
> SELECT rs.i_brand_id, rs.d, rs.m, rs.s
> FROM (
> SELECT i_brand_id, floor_day(`__time`) as d, max(ss_quantity) as m,
> sum(ss_wholesale_cost) as s,
> ROW_NUMBER() OVER (PARTITION BY floor_day(`__time`) ORDER BY
> sum(ss_wholesale_cost) DESC ) AS rownum
> FROM store_sales_sold_time_subset
> GROUP BY i_brand_id, floor_day(`__time`)
> ) rs
> WHERE rownum <= 10;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)