brijrajk commented on code in PR #12151:
URL: https://github.com/apache/gluten/pull/12151#discussion_r3500310173
##########
gluten-ut/spark40/src/test/resources/backends-velox/tpcds-plan-stability/gluten-approved-plans-modified/q59.sf100/simplified.txt:
##########
@@ -23,52 +23,42 @@ VeloxColumnarToRow
InputIteratorTransformer
InputAdapter
ColumnarBroadcastExchange #2
- WholeStageCodegenTransformer (3)
- FilterExecTransformer
[d_date_sk,d_week_seq]
- Subquery #1
- VeloxColumnarToRow
-
WholeStageCodegenTransformer (2)
-
RegularHashAggregateExecTransformer [bloom_filter_agg(xxhash64(d_week_seq, 42),
335, 8990, 0, 0),bloomFilter,buf,buf]
-
InputIteratorTransformer
- InputAdapter
- ColumnarExchange
#3
-
VeloxResizeBatches
-
WholeStageCodegenTransformer (1)
-
FlushableHashAggregateExecTransformer [_pre_x] [buf,buf,buf]
+ RowToVeloxColumnar
+ WholeStageCodegen (1)
+ Filter [d_date_sk,d_week_seq]
+ Subquery #1
+ ObjectHashAggregate
[buf] [bloom_filter_agg(xxhash64(d_week_seq, 42), 335, 8990, 0,
0),bloomFilter,buf]
+ VeloxColumnarToRow
+ ColumnarExchange #3
+ VeloxResizeBatches
+
RowToVeloxColumnar
+
ObjectHashAggregate [d_week_seq] [buf,buf]
+
VeloxColumnarToRow
+
WholeStageCodegenTransformer (1)
Review Comment:
@zhztheplayer Yes, `ObjectHashAggregate` is in the final executed plan, and
the fallback is intentional.
**`bloom_filter_agg` is the attribute name, not the physical function
class.** It comes from `resultExpressions`, which is frozen when
`ObjectHashAggregateExec` is first built with vanilla `BloomFilterAggregate`.
Since `transformAllExpressions` only patches `aggregateExpressions` and not
`resultExpressions`, the name stays the same regardless of any later
substitution.
**The physical function inside is `VeloxBloomFilterAggregate`, not
vanilla.** `BloomFilterRuntimeFilterRewriteRule` substitutes it via
`subq.withNewPlan(rewrittenPlan)`. `eval()` calls `serialize(buffer)`
unconditionally, producing version=1 bytes that `VeloxBloomFilterMightContain`
reads correctly.
**JVM mode is unavoidable here.** `PlanSubqueries` (pos 2) prepares the
subquery before `ApplyColumnarRulesAndInsertTransitions` (pos 10) runs, so
`HeuristicTransform` sees vanilla `BloomFilterAggregate` with no Substrait
mapping and emits `ObjectHashAggregateExec`. By the time our rule patches it at
pos 10 on the main plan, `HeuristicTransform` has already finished.
**Substituting earlier would break SPARK-54336.** If we substitute during
subquery preparation, it also fires for `might_contain(empty_subquery,
literal)`. Since `VeloxBloomFilterAggregate` has no cardinality guard, empty
input returns bytes instead of `null`.
Added `GlutenBloomFilterFallbackSuite."GLUTEN-12013:
VeloxBloomFilterAggregate in JVM subquery produces correct Velox bytes"` that
inspects `collectWithSubqueries(executedPlan)` for
`ObjectHashAggregateExec(VeloxBloomFilterAggregate)` and verifies correct query
results.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]