liujiayi771 commented on code in PR #12264:
URL: https://github.com/apache/gluten/pull/12264#discussion_r3425134897
##########
backends-velox/src/main/scala/org/apache/gluten/execution/HashJoinExecTransformer.scala:
##########
@@ -197,4 +197,9 @@ case class BroadcastHashJoinContext(
buildHashTableId: String,
isNullAwareAntiJoin: Boolean = false,
bloomFilterPushdownSize: Long,
- buildHashTableTimeMetric: Option[SQLMetric] = None)
+ buildHashTableTimeMetric: Option[SQLMetric] = None) {
+ def droppedDuplicates: Boolean = {
Review Comment:
Should we also take the filter case into account here?
Native dropDuplicates_ (Velox
[PlanNode.h#L3247](https://github.com/facebookincubator/velox/blob/f0d94f82c5a006067578569a18ad48933ff81e93/velox/core/PlanNode.h#L3247)
/ Gluten HashTableBuilder) is `!withFilter && (semi || anti)`. With a filter
the table keeps duplicates and stores payload columns. Since the join condition
isn't part of `HashedRelationBroadcastMode`, a filtered and a non-filtered
semi/anti could reuse the same broadcast exchange — and with the current SEMI
|| ANTI flag both get `droppedDuplicates = true`, so one might clone the
other's key-only/deduped table.
Maybe `droppedDuplicates` should also include the `!hasJoinFilter`
condition? Just want to make sure I'm not missing something here. And please
add a test case for this as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]