[jira] [Commented] (CALCITE-4213) Druid plans with small intervals should be chosen over full interval scan plus filter

Stamatis Zampetakis (Jira) Wed, 02 Sep 2020 03:44:58 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189143#comment-17189143
 ]


Stamatis Zampetakis commented on CALCITE-4213:
----------------------------------------------

>From a quick look it seems again a result of plans having exactly the same 
>cost but one dominates the other due to the order that rules are applied. I 
>mark it as a bug and I guess as a regression since the test used to work at 
>some point. 

I assume that fixing this requires again tweaking the cost model in Druid.

> Druid plans with small intervals should be chosen over full interval scan 
> plus filter
> -------------------------------------------------------------------------------------
>
>                 Key: CALCITE-4213
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4213
>             Project: Calcite
>          Issue Type: Bug
>          Components: druid-adapter
>            Reporter: Stamatis Zampetakis
>            Priority: Major
>
> The problem was observed due to the failure of 
> DruidAdapterIT#testFilterTimestamp.
> {code:sql}
>  select count(*) as c
> from "foodmart"
> where extract(year from "timestamp") = 1997
> and extract(month from "timestamp") in (4, 6)
> {code}
> +Expected+
> {noformat}
> EnumerableInterpreter
>  DruidQuery(table=[[foodmart, foodmart]], 
> intervals=[[1997-04-01T00:00:00.000Z/1997-05-01T00:00:00.000Z, 
> 1997-06-01T00:00:00.000Z/1997-07-01T00:00:00.000Z]], projects=[[0]], 
> groups=[{}], aggs=[[COUNT()]])
> {noformat}
> +Actual+
> {noformat}
> EnumerableInterpreter
>   DruidQuery(table=[[foodmart, foodmart]], 
> intervals=[[1900-01-09T00:00:00.000Z/2992-01-10T00:00:00.000Z]], 
> filter=[AND(=(EXTRACT(FLAG(YEAR), $0), 1997), OR(=(EXTRACT(FLAG(MONTH), $0), 
> 4), =(EXTRACT(FLAG(MONTH), $0), 6)))], groups=[{}], aggs=[[COUNT()]])
> {noformat}
> Observe that the actual plan has an interval that basically touches all data 
> so in most cases it is less efficient than the expected one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-4213) Druid plans with small intervals should be chosen over full interval scan plus filter

Reply via email to