[
https://issues.apache.org/jira/browse/FLINK-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299736#comment-15299736
]
ASF GitHub Bot commented on FLINK-3941:
---------------------------------------
Github user fhueske commented on a diff in the pull request:
https://github.com/apache/flink/pull/2025#discussion_r64541694
--- Diff:
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/nodes/dataset/DataSetUnion.scala
---
@@ -69,16 +73,23 @@ class DataSetUnion(
rows + metadata.getRowCount(child)
}
- planner.getCostFactory.makeCost(rowCnt, 0, 0)
+ planner.getCostFactory.makeCost(
+ rowCnt,
+ if (all) 0 else rowCnt,
+ if (all) 0 else rowCnt)
}
override def translateToPlan(
tableEnv: BatchTableEnvironment,
expectedType: Option[TypeInformation[Any]]): DataSet[Any] = {
- val leftDataSet =
left.asInstanceOf[DataSetRel].translateToPlan(tableEnv)
- val rightDataSet =
right.asInstanceOf[DataSetRel].translateToPlan(tableEnv)
- leftDataSet.union(rightDataSet).asInstanceOf[DataSet[Any]]
+ val leftDataSet =
left.asInstanceOf[DataSetRel].translateToPlan(tableEnv, expectedType)
+ val rightDataSet =
right.asInstanceOf[DataSetRel].translateToPlan(tableEnv, expectedType)
+ if (all) {
+ leftDataSet.union(rightDataSet).asInstanceOf[DataSet[Any]]
+ } else {
+ leftDataSet.union(rightDataSet).distinct().asInstanceOf[DataSet[Any]]
--- End diff --
Oh, yes. Completely forgot about that rule... 😊
So, we already supported the non-all union for SQL. Only the Table API was
missing the `union()` method.
I think there are two ways to continue:
- remove the `UnionToDistinctRule` from `FlinkRuleSets`
- revert the changes on `DataSetUnion` (except of pushing down the
`expectedType`) and `DataSetUnionRule`.
I am fine either ways.
> Add support for UNION (with duplicate elimination)
> --------------------------------------------------
>
> Key: FLINK-3941
> URL: https://issues.apache.org/jira/browse/FLINK-3941
> Project: Flink
> Issue Type: New Feature
> Components: Table API
> Affects Versions: 1.1.0
> Reporter: Fabian Hueske
> Assignee: Yijie Shen
> Priority: Minor
>
> Currently, only UNION ALL is supported by Table API and SQL.
> UNION (with duplicate elimination) can be supported by applying a
> {{DataSet.distinct()}} after the union on all fields. This issue includes:
> - Extending {{DataSetUnion}}
> - Relaxing {{DataSetUnionRule}} to translated non-all unions.
> - Extend the Table API with union() method.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)