Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4471#discussion_r132468046
  
    --- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/util/UpdatingPlanChecker.scala
 ---
    @@ -56,17 +59,20 @@ object UpdatingPlanChecker {
       }
     
       /** Identifies unique key fields in the output of a RelNode. */
    -  private class UniqueKeyExtractor extends RelVisitor {
    -
    -    var keys: Option[Array[String]] = None
    +  private class UniqueKeyExtractor {
     
    -    override def visit(node: RelNode, ordinal: Int, parent: RelNode): Unit 
= {
    +    // visit() function will return a tuple, the first element of tuple is 
the key, the second is
    +    // the key's corresponding ancestor. Ancestors are used to identify 
same keys, for example:
    --- End diff --
    
    I think a more common term than "ancestor" is "equivalence group". In 
principle, this is used to identify fields which are equivalent. I think we 
should not point to a field in the input of an operator but rather choose on of 
the fields in the current input as "id" for the equivalence group. For example, 
if we have a table `(a, b, c)` and do `select(a, a as x, b as y, b as z)` I 
would resolve these fields as `[(a, a), (x, a), (y, y), (z, y)]`, i.e, always 
use the lexicographic smallest attribute as the common group id. 
    
    IMO, this convention is easier to handle if we have to work with 
equivalence groups which are joined by equi-predicates.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to