[
https://issues.apache.org/jira/browse/FLINK-8903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401160#comment-16401160
]
ASF GitHub Bot commented on FLINK-8903:
---------------------------------------
GitHub user fhueske opened a pull request:
https://github.com/apache/flink/pull/5706
[FLINK-8903] [table] Fix VAR_SAMP, VAR_POP, STDEV_SAMP, STDEV_POP functions
on GROUP BY windows.
## What is the purpose of the change
* Fixes the computation of `VAR_SAMP`, `VAR_POP`, `STDDEV_SAMP`,
`STDDEV_POP` aggregations in the context of `GROUP BY` windows (`TUMBLE`,
`HOP`, `SESSION`). Right now, these methods are computed as `AVG`.
## Brief change log
* copy Calcite's `AggregateReduceFunctionsRule` to Flink and improve its
extensibility
* add a `WindowAggregateReduceFunctionsRule` based on the copied
`AggregateReduceFunctionsRule` to decompose the faulty aggregation functions
into `COUNT` and `SUM` functions.
* add restriction to `FlinkLogicalWindowAggregateConverter` to prevent
translation of group window aggregates with failing aggregation functions
* prevent translation of `VAR_SAMP`, `VAR_POP`, `STDDEV_SAMP`, `STDDEV_POP`
in `AggregateUtil`
* add unit tests (plan validation) for batch (SQL, Table API) and stream
(SQL, Table API)
## Verifying this change
* run the added plan tests
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): **no**
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: **no**
- The serializers: **no**
- The runtime per-record code paths (performance sensitive): **no**
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: **no**
- The S3 file system connector: **no**
## Documentation
- Does this pull request introduce a new feature? **no**
- If yes, how is the feature documented? **n/a**
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/fhueske/flink tableVarStddevAggFix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5706.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5706
----
commit 517567348b0ec0c23ef0c1dcc05c54a91d5c5671
Author: Fabian Hueske <fhueske@...>
Date: 2018-03-15T20:04:00Z
[FLINK-8903] [table] Fix VAR_SAMP, VAR_POP, STDEV_SAMP, STDEV_POP functions
on GROUP BY windows.
----
> Built-in agg functions VAR_POP, VAR_SAMP, STDEV_POP, STDEV_SAMP are broken in
> Group Windows
> -------------------------------------------------------------------------------------------
>
> Key: FLINK-8903
> URL: https://issues.apache.org/jira/browse/FLINK-8903
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
> Affects Versions: 1.3.2, 1.5.0, 1.4.2
> Reporter: lilizhao
> Assignee: Fabian Hueske
> Priority: Critical
> Fix For: 1.5.0
>
> Attachments: QQ图片20180312180143.jpg, TableAndSQLTest.java
>
>
> The built-in aggregation functions VAR_POP, VAR_SAMP, STDEV_POP, STDEV_SAMP
> are translated into regular AVG functions if they are applied in the context
> of a Group Window aggregation (\{{GROUP BY TUMBLE/HOP/SESSION}}).
> The reason is that these functions are internally represented as
> {{SqlAvgAggFunction}} but with different {{SqlKind}}. When translating
> Calcite aggregation functions to Flink Table agg functions, we only look at
> the type of the class, not at the value of the {{kind}} field. We did not
> notice that before, because in all other cases (regular {{GROUP BY}} without
> windows or {{OVER}} windows, we have a translation rule
> {{AggregateReduceFunctionsRule}} that decomposes the more complex functions
> into expressions of {{COUNT}} and {{SUM}} functions such that we never
> execute an {{AVG}} Flink function. That rule can only be applied on
> {{LogicalAggregate}}, however, we represent group windows as
> {{LogicalWindowAggregate}}, so the rule does not match.
> We should fix this by:
> 1. restrict the translation to Flink avg functions in {{AggregateUtil}} to
> {{SqlKind.AVG}}.
> 2. implement a rule (hopefully based on {{AggregateReduceFunctionsRule}})
> that decomposes the complex agg functions into the {{SUM}} and {{COUNT}}.
> Step 1. is easy and a quick fix but we would get an exception "Unsupported
> Function" if {{VAR_POP}} is used in a {{GROUP BY}} window.
> Step 2. might be more involved, depending on how difficult it is to port the
> rule.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)