(spark) branch master updated: [SPARK-50489][SQL][PYTHON][FOLLOW-UP] Add applyInArrow in `DeduplicateRelations#collectConflictPlans`

gurwls223 Thu, 05 Dec 2024 16:46:40 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new a435b2e634bd [SPARK-50489][SQL][PYTHON][FOLLOW-UP] Add applyInArrow in 
`DeduplicateRelations#collectConflictPlans`
a435b2e634bd is described below

commit a435b2e634bd4fc675dee790961d38423e9bef6a
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Fri Dec 6 09:46:16 2024 +0900

    [SPARK-50489][SQL][PYTHON][FOLLOW-UP] Add applyInArrow in 
`DeduplicateRelations#collectConflictPlans`
    
    ### What changes were proposed in this pull request?
    Add applyInArrow in `DeduplicateRelations#collectConflictPlans`
    
    ### Why are the changes needed?
    In https://github.com/apache/spark/pull/49056, I forgot to add 
`applyInArrow` in `DeduplicateRelations#collectConflictPlans`
    
    ### Does this PR introduce _any_ user-facing change?
    no
    
    ### How was this patch tested?
    tests added in https://github.com/apache/spark/pull/49056
    
    ### Was this patch authored or co-authored using generative AI tooling?
    no
    
    Closes #49069 from zhengruifeng/apply_in_arrow_rule.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 .../spark/sql/catalyst/analysis/DeduplicateRelations.scala   | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala
index 52be631d94d8..8398fb8d1e83 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala
@@ -392,12 +392,24 @@ object DeduplicateRelations extends Rule[LogicalPlan] {
         newVersion.copyTagsFrom(oldVersion)
         Seq((oldVersion, newVersion))
 
+      case oldVersion @ FlatMapGroupsInArrow(_, _, output, _)
+        if oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
+        val newVersion = oldVersion.copy(output = output.map(_.newInstance()))
+        newVersion.copyTagsFrom(oldVersion)
+        Seq((oldVersion, newVersion))
+
       case oldVersion @ FlatMapCoGroupsInPandas(_, _, _, output, _, _)
         if oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
         val newVersion = oldVersion.copy(output = output.map(_.newInstance()))
         newVersion.copyTagsFrom(oldVersion)
         Seq((oldVersion, newVersion))
 
+      case oldVersion @ FlatMapCoGroupsInArrow(_, _, _, output, _, _)
+        if oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
+        val newVersion = oldVersion.copy(output = output.map(_.newInstance()))
+        newVersion.copyTagsFrom(oldVersion)
+        Seq((oldVersion, newVersion))
+
       case oldVersion @ MapInPandas(_, output, _, _, _)
         if oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
         val newVersion = oldVersion.copy(output = output.map(_.newInstance()))


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-50489][SQL][PYTHON][FOLLOW-UP] Add applyInArrow in `DeduplicateRelations#collectConflictPlans`

Reply via email to