[ 
https://issues.apache.org/jira/browse/FLINK-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318619#comment-16318619
 ] 

Fabian Hueske edited comment on FLINK-8355 at 1/9/18 3:47 PM:
--------------------------------------------------------------

The motivation for the {{DataSetAggregateWithNullValuesRule}} is to prevent 
incorrect aggreagtion results for empty tables. For instance the query {{SELECT 
COUNT( *) FROM mytable}} should return a row {{(0)}} and not an empty result. 

Until now, the built-in aggregations were working correctly because they 
ignored {{null}} values. However, UDAGGs might compute incorrect results if 
they would not ignore {{null}} values. Hence, it definitely makes sense to 
remove the rule.

A solution would be to add a {{MapPartitionFunction}} with parallelism 1 after 
a groupless aggregation. The {{MapPartitionFunction}} would simply forward all 
input data. If the input is empty, it emits a single result row with all 
aggregates at initialized state.


was (Author: fhueske):
The motivation for the {{DataSetAggregateWithNullValuesRule}} is to prevent 
incorrect aggreagtion results for empty tables. For instance the query {{SELECT 
COUNT(*) FROM mytable}} should return a row {{(0}} and not an empty result. 

Until now, the built-in aggregations were working correctly because they 
ignored {{null}} values. However, UDAGGs might compute incorrect results if 
they would not ignore {{null}} values. Hence, it definitely makes sense to 
remove the rule.

A solution would be to add a {{MapPartitionFunction}} with parallelism 1 after 
a groupless aggregation. The {{MapPartitionFunction}} would simply forward all 
input data. If the input is empty, it emits a single result row with all 
aggregates at initialized state.

> DataSet Should not union a NULL row for AGG without GROUP BY clause.
> --------------------------------------------------------------------
>
>                 Key: FLINK-8355
>                 URL: https://issues.apache.org/jira/browse/FLINK-8355
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API & SQL
>    Affects Versions: 1.5.0
>            Reporter: sunjincheng
>
> Currently {{DataSetAggregateWithNullValuesRule}} will UINON a NULL row for  
> non grouped aggregate query. when {{CountAggFunction}} support 
> {{COUNT(*)}}(FLINK-8325).  the result will incorrect.
> for example, if Tabble {{T1}} has 3 records. when we run the follow SQL in 
> DataSet: 
> {code}
> SELECT COUNT(*) as cnt from Tab // cnt = 4(incorrect).
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to