[
https://issues.apache.org/jira/browse/FLINK-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318619#comment-16318619
]
Fabian Hueske edited comment on FLINK-8355 at 1/9/18 3:47 PM:
--------------------------------------------------------------
The motivation for the {{DataSetAggregateWithNullValuesRule}} is to prevent
incorrect aggreagtion results for empty tables. For instance the query {{SELECT
COUNT( *) FROM mytable}} should return a row {{(0)}} and not an empty result.
Until now, the built-in aggregations were working correctly because they
ignored {{null}} values. However, UDAGGs might compute incorrect results if
they would not ignore {{null}} values. Hence, it definitely makes sense to
remove the rule.
A solution would be to add a {{MapPartitionFunction}} with parallelism 1 after
a groupless aggregation. The {{MapPartitionFunction}} would simply forward all
input data. If the input is empty, it emits a single result row with all
aggregates at initialized state.
was (Author: fhueske):
The motivation for the {{DataSetAggregateWithNullValuesRule}} is to prevent
incorrect aggreagtion results for empty tables. For instance the query {{SELECT
COUNT(*) FROM mytable}} should return a row {{(0}} and not an empty result.
Until now, the built-in aggregations were working correctly because they
ignored {{null}} values. However, UDAGGs might compute incorrect results if
they would not ignore {{null}} values. Hence, it definitely makes sense to
remove the rule.
A solution would be to add a {{MapPartitionFunction}} with parallelism 1 after
a groupless aggregation. The {{MapPartitionFunction}} would simply forward all
input data. If the input is empty, it emits a single result row with all
aggregates at initialized state.
> DataSet Should not union a NULL row for AGG without GROUP BY clause.
> --------------------------------------------------------------------
>
> Key: FLINK-8355
> URL: https://issues.apache.org/jira/browse/FLINK-8355
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
> Affects Versions: 1.5.0
> Reporter: sunjincheng
>
> Currently {{DataSetAggregateWithNullValuesRule}} will UINON a NULL row for
> non grouped aggregate query. when {{CountAggFunction}} support
> {{COUNT(*)}}(FLINK-8325). the result will incorrect.
> for example, if Tabble {{T1}} has 3 records. when we run the follow SQL in
> DataSet:
> {code}
> SELECT COUNT(*) as cnt from Tab // cnt = 4(incorrect).
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)