Jefffrey opened a new issue, #8379:
URL: https://github.com/apache/arrow-datafusion/issues/8379

   ### Is your feature request related to a problem or challenge?
   
   In Spark, they have a concept of `ExprId` which is used to uniquely identify 
named expressions:
   
   
https://github.com/apache/spark/blob/9bb358b51e30b5041c0cd20e27cf995aca5ed4c7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala#L41-L57
   
   ```scala
   /**
    * A globally unique id for a given named expression.
    * Used to identify which attribute output by a relation is being
    * referenced in a subsequent computation.
    *
    * The `id` field is unique within a given JVM, while the `uuid` is used to 
uniquely identify JVMs.
    */
   case class ExprId(id: Long, jvmId: UUID) {
   
     override def equals(other: Any): Boolean = other match {
       case ExprId(id, jvmId) => this.id == id && this.jvmId == jvmId
       case _ => false
     }
   
     override def hashCode(): Int = id.hashCode()
   
   }
   ```
   
   Is it worth as attempting to introduce something similar in DataFusion?
   
   There are issues being caused by rules in the optimizer comparing directly 
on column name leading to bugs when duplicate names appear, such as 
https://github.com/apache/arrow-datafusion/issues/8374
   
   If during the analysis of a plan we can assign unique numeric IDs for 
columns, we could check for column equality based on these IDs and not need to 
compare string names.
   
   The obvious downside would be this seems like a large effort in refactoring, 
not to mention breaking changes.
   
   ### Describe the solution you'd like
   
   Consider introduction of unique ID for columns/expressions to potentially 
simplify optimization/planning code
   
   ### Describe alternatives you've considered
   
   Don't do this (large refactoring effort? breaking changes?)
   
   ### Additional context
   
   Just a thought I had bouncing in my head, would appreciate to hear more 
thoughts on this (even if this seems unfeasible), or if there was already some 
prior discussion on a similar topic


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to