[jira] [Commented] (CALCITE-2648) Output collation of EnumerableWindow is not consistent with its implementation

Julian Hyde (JIRA) Fri, 04 Jan 2019 15:24:06 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734670#comment-16734670
 ]


Julian Hyde commented on CALCITE-2648:
--------------------------------------

Don't set the distribution trait. It only relates to distributed execution 
frameworks (e.g. Hadoop and Spark) where there are multiple instances of each 
operator, each processing one slice of the input.

I like the idea of exploiting order. Either use fact that the input is already 
sorted, or add a Sort. (In Volcano it amounts to the same thing: you ask for a 
RelSubset with the desired sort order, and it may or may not have higher cost 
than the current best.)

How about making a sub-class of EnumerableWindow that exploits the order of the 
input? The code generated would be significantly different, because there is no 
need to buffer rows. And the cost function would a significantly different.

If there are multiple windows, one which requires ORDER BY x and another that 
requires ORDER BY y, then the typical plan would be scan &rarr; sort &rarr; 
window &rarr; sort &rarr; window. (In CALCITE-2764, [~vlsi] and I discussed 
relational expressions that are sorted by X and Y at the same time, but I 
maintain that this only occurs trivial relations, e.g. VALUES with 1 record, 
and therefore is not useful in practice.)

> Output collation of EnumerableWindow is not consistent with its implementation
> ------------------------------------------------------------------------------
>
>                 Key: CALCITE-2648
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2648
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.17.0
>            Reporter: Hongze Zhang
>            Assignee: Julian Hyde
>            Priority: Major
>
> Here is a case:
> {code:sql}
> select x, COUNT(*) OVER (PARTITION BY x) from (values (20), (35)) as t(x) 
> ORDER BY x
> {code}
> Final plan:
> {code:java}
> EnumerableWindow(window#0=[window(partition {0} order by [] range between 
> UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [COUNT()])])
>   EnumerableValues(tuples=[[{ 20 }, { 35 }]])
> {code}
> Output rows:
> {code:java}
> X  |EXPR$1 |
> ---|-------|
> 35 |1      |
> 20 |1      |
> {code}
> EnumerableWindow is supposed to preserve input collations, as a result 
> EnumerableSort is ignored. However the implementation of EnumerableWindow 
> generates non-ordered output (when PARTITION BY clause is used).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2648) Output collation of EnumerableWindow is not consistent with its implementation

Reply via email to