[
https://issues.apache.org/jira/browse/CALCITE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881722#comment-16881722
]
Haisheng Yuan commented on CALCITE-3181:
----------------------------------------
{quote}
It would make sense to have an implementation of Window (let's call it
StreamingWindow) that exploits the fact that the input is sorted.
{quote}
This is exactly how Greenplum database and Orca optimizer implement Window
operator. In Orca, the physical window operator requests physical properties of
[partition keys] hash distribution and [partition keys, sort keys] sort order,
which gives optimizer more optimization opportunities, like window reordering
to avoid redundant sort or shuffle.
{quote}
if Sort supported partitioned top-n, then we could create a StreamingWindow on
top of a top-n Sort, and the window would not even have to worry that the Sort
is removing rows.
{quote}
Agree. So that we don't need to change Window semantics. What physical order
property would the top-n Sort deliver? [partition keys + sort keys]? But if
the sort is hash table based implementation, the [sort keys] order trait might
be wrong.
> Support limit per group in Window
> ---------------------------------
>
> Key: CALCITE-3181
> URL: https://issues.apache.org/jira/browse/CALCITE-3181
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Haisheng Yuan
> Priority: Major
>
> We have a lot of queries like the following to retrieve top N tuples per
> group:
> {code:java}
> SELECT x, y FROM
> (SELECT x, y, ROW_NUMBER() OVER (PARTITION BY x ORDER BY y)
> AS rn FROM t1) t2 WHERE rn <= 3;
> {code}
> The performance is not good if each group has a lot more tuples than wanted,
> because we will retrieve and sort all the tuples, instead of just doing a
> top-N heap sort.
> In order to do optimization for this kind of query, we need to extend window
> to support limit, if and only if there is only 1 window function, and it is
> {{row_number()}}. We also need a substitute rule to push the limit into
> window. Of course, we also need to modify executor to support this
> optimization (can be later).
> {code:java}
> Filter (rn <= 3)
> +- Window (window#0={Partition by x order by y ROW_NUMBER()})
> {code}
> to
> {code:java}
> Filter (rn <= 3)
> +- Window (window#0={Partition by x order by y limit 3 ROW_NUMBER()})
> {code}
> Thoughts? Objections?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)