Re: [PR] feat: Support HashJoin operator [arrow-datafusion-comet]

via GitHub Tue, 12 Mar 2024 13:40:57 -0700


viirya commented on code in PR #194:
URL: 
https://github.com/apache/arrow-datafusion-comet/pull/194#discussion_r1522083809



##########
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala:
##########
@@ -1836,6 +1838,48 @@ object QueryPlanSerde extends Logging with 
ShimQueryPlanSerde {
           }
         }
 
+      case join: ShuffledHashJoinExec if isCometOperatorEnabled(op.conf, 
"hash_join") =>
+        if (join.buildSide == BuildRight) {
+          // DataFusion HashJoin assumes build side is always left.
+          // TODO: support BuildRight

Review Comment:
   Yea, in DataFusion, only left side could be the build side. But in Spark, 
the HashJoin operator has a build side parameter to indicate which side is 
build side. The operator will do right thing accordingly internally. So 
currently we cannot just create a DataFusion HashJoin operator with right side 
as build side.
   
   It can be swapped between left and right side, only if we also swap outputs 
and also column binding in joining keys and joining filter. I'd like to relax 
the build side constraint in DataFusion instead of doing the swap in Comet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: Support HashJoin operator [arrow-datafusion-comet]

Reply via email to