[
https://issues.apache.org/jira/browse/FLINK-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898964#comment-15898964
]
ASF GitHub Bot commented on FLINK-5722:
---------------------------------------
Github user fhueske commented on a diff in the pull request:
https://github.com/apache/flink/pull/3471#discussion_r104613673
--- Diff:
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/rules/dataSet/DataSetAggregateRule.scala
---
@@ -42,6 +41,14 @@ class DataSetAggregateRule
return false
}
+ // distinct is translated into dedicated operator
+ if (agg.getAggCallList.isEmpty &&
+ agg.getGroupCount == agg.getRowType.getFieldCount &&
--- End diff --
During testing I observed a case where a projection wasn't pushed down.
Without this check, the grouping would happen on a subset of fields and
only those would be emitted. It is not possible to change the input and output
types of a ReduceFunction (which is used to implement Distinct in a
hash-combinable way), I check that the input and output types are identical.
> Implement DISTINCT as dedicated operator
> ----------------------------------------
>
> Key: FLINK-5722
> URL: https://issues.apache.org/jira/browse/FLINK-5722
> Project: Flink
> Issue Type: Improvement
> Components: Table API & SQL
> Affects Versions: 1.2.0, 1.3.0
> Reporter: Fabian Hueske
> Assignee: Fabian Hueske
>
> DISTINCT is currently implemented for batch Table API / SQL as an aggregate
> which groups on all fields. Grouped aggregates are implemented as GroupReduce
> with sort-based combiner.
> This operator can be more efficiently implemented by using ReduceFunction and
> hinting a HashCombine strategy. The same ReduceFunction can be used for all
> DISTINCT operations and can be assigned with appropriate forward field
> annotations.
> We would need a custom conversion rule which translates distinct aggregations
> (grouping on all fields and returning all fields) into a custom
> DataSetRelNode.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)