[
https://issues.apache.org/jira/browse/CALCITE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403084#comment-16403084
]
Fabian Hueske edited comment on CALCITE-2216 at 3/16/18 11:02 PM:
------------------------------------------------------------------
These fields are not really hidden but just fields emitted from the window
aggregation operator (as requested by a query ({{SELECT TUMBLE_START(...)
...)}}. All time fields are explicit in Flink's SQL implementation.
In some sense, these fields are grouping columns (window start and end can
define a window), but since they do not exist in the input schema we cannot
reference them as such because grouping columns are represented as bitset. We
want to avoid adding a calc before the aggregate that conceptually evaluates
the window function (TUMBLE/HOP/SESSION) and its helper functions
(TUMBLE_START, etc.), since this would make the translation to Flink's
operators more difficult. Moreover, it is not really possible to evaluate a
SESSION function in a Calc and lead to a conceptual mismatch.
We could represent the fields as agg calls, but IMO that's a conceptual
mismatch and work around as well.
I'm aware that Flink's approach is not very elegant, but all alternatives that
I'm aware of have drawbacks as well.
was (Author: fhueske):
These fields are not really hidden but just fields emitted from the window
aggregation operator. All time fields are explicit in Flink's SQL
implementation.
In some sense, these fields are grouping columns (window start and end can
define a window), but since they do not exist in the input schema we cannot
reference them as such because grouping columns are represented as bitset. We
want to avoid adding a calc before the aggregate that conceptually evaluates
the window function (TUMBLE/HOP/SESSION) and its helper functions
(TUMBLE_START, etc.), since this would make the translation to Flink's
operators more difficult. Moreover, it is not really possible to evaluate a
SESSION function in a Calc and lead to a conceptual mismatch.
We could represent the fields as agg calls, but IMO that's a conceptual
mismatch and work around as well.
I'm aware that Flink's approach is not very elegant, but all alternatives that
I'm aware of have drawbacks as well.
> Improve extensibility of AggregateReduceFunctionsRule
> -----------------------------------------------------
>
> Key: CALCITE-2216
> URL: https://issues.apache.org/jira/browse/CALCITE-2216
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.15.0
> Reporter: Fabian Hueske
> Assignee: Julian Hyde
> Priority: Minor
>
> I'm proposing to improve the extensibility of
> {{AggregateReduceFunctionsRule}}. The purpose of the rule is to decompose
> complex aggregation functions like {{VAR_POP}} and {{STDDEV_SAMP}} into
> {{COUNT}} and {{SUM}} functions and compute the original functions in a
> subsequent Calc operator.
> Right now, the rule class provides a {{protected}} method that can be
> overridden to create an {{Aggregate}} with the updated aggregate calls.
> We are using the rule in Flink and have a special {{Aggregate}} Rel for
> group-windowed aggregations ({{GROUP BY TUMBLE/HOP/SESSION}}). Our
> implementation requires to forward some additional fields from the
> {{Aggregate}} for window properties like {{TUMBLE_START}} or {{HOP_END}}. In
> the current form, we cannot extend the rule, because these fields are striped
> off by the {{Calc}} node that is automatically added by the rule.
> I'm proposing to also move the code to create the {{Calc}} into a
> {{protected}} method just like the code to create the new {{Aggregate}}.
> I know, this is a fairly Flink-specific issue, but the code changes are
> minimal (no change in functionality) and it would help us, because we would
> not need to copy the rule and maintain it in Flink.
> I'll open a PR for this. Looking forward to your comments.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)