Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/4471#discussion_r132468046 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/util/UpdatingPlanChecker.scala --- @@ -56,17 +59,20 @@ object UpdatingPlanChecker { } /** Identifies unique key fields in the output of a RelNode. */ - private class UniqueKeyExtractor extends RelVisitor { - - var keys: Option[Array[String]] = None + private class UniqueKeyExtractor { - override def visit(node: RelNode, ordinal: Int, parent: RelNode): Unit = { + // visit() function will return a tuple, the first element of tuple is the key, the second is + // the key's corresponding ancestor. Ancestors are used to identify same keys, for example: --- End diff -- I think a more common term than "ancestor" is "equivalence group". In principle, this is used to identify fields which are equivalent. I think we should not point to a field in the input of an operator but rather choose on of the fields in the current input as "id" for the equivalence group. For example, if we have a table `(a, b, c)` and do `select(a, a as x, b as y, b as z)` I would resolve these fields as `[(a, a), (x, a), (y, y), (z, y)]`, i.e, always use the lexicographic smallest attribute as the common group id. IMO, this convention is easier to handle if we have to work with equivalence groups which are joined by equi-predicates.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---