rui-mo commented on code in PR #11392:
URL:
https://github.com/apache/incubator-gluten/pull/11392#discussion_r2733087767
##########
backends-velox/src/test/scala/org/apache/gluten/execution/VeloxHashJoinSuite.scala:
##########
@@ -307,4 +307,75 @@ class VeloxHashJoinSuite extends
VeloxWholeStageTransformerSuite {
runQueryAndCompare(q5) { _ => }
}
}
+
+ test("Hash probe dynamic filter pushdown") {
+ withSQLConf(
+ VeloxConfig.HASH_PROBE_DYNAMIC_FILTER_PUSHDOWN_ENABLED.key -> "true",
+ VeloxConfig.HASH_PROBE_BLOOM_FILTER_PUSHDOWN_MAX_SIZE.key -> "1048576"
+ ) {
+ withTable("probe_table", "build_table") {
+ spark.sql("""
+ CREATE TABLE probe_table USING PARQUET
+ AS SELECT id as a FROM range(20000)
+ """)
+
+ spark.sql("""
+ CREATE TABLE build_table USING PARQUET
+ AS SELECT id * 1000 as b FROM range(10001)
+ """)
+
+ runQueryAndCompare(
+ "SELECT a FROM probe_table JOIN build_table ON a = b"
+ ) {
+ df =>
+ val join = find(df.queryExecution.executedPlan) {
+ case _: ShuffledHashJoinExecTransformer => true
+ case _ => false
+ }
+ assert(join.isDefined)
+ val metrics = join.get.metrics
+
+ assert(metrics.contains("hashProbeDynamicFiltersProduced"))
+ assert(metrics("hashProbeDynamicFiltersProduced").value > 0)
Review Comment:
I investigated the cause of the test failure and identified the following
issues:
1) The test uses a broadcast hash join, which causes it to fail at L335. We
can set the configuration `spark.sql.autoBroadcastJoinThreshold` to -1 to force
the use of a shuffled hash join instead.
2) After applying this change, the metrics assertion still fails because
`metrics("hashProbeDynamicFiltersProduced").value` is zero. This happens
because, in this scenario, the Velox plan consists of `ValuesStream` operators
as the children of the HashJoin, which is a typical join plan in Gluten when
shuffle exists, but does not support dynamic filter pushdown. Dynamic filters
only take effect when the HashProbe child is a TableScan. I'm not sure whether
such a case can be constructed in a Gluten unit test.
```
-- Project[4][expressions: (n4_2:BIGINT, "n3_2")] -> n4_2:BIGINT
-- Project[3][expressions: (n3_2:BIGINT, "n0_0"), (n3_3:BIGINT, "n1_0")]
-> n3_2:BIGINT, n3_3:BIGINT
-- HashJoin[2][INNER n0_0=n1_0] -> n0_0:BIGINT, n1_0:BIGINT
-- ValueStream[0][] -> n0_0:BIGINT
-- ValueStream[1][] -> n1_0:BIGINT
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]