zhztheplayer commented on code in PR #5433:
URL: https://github.com/apache/incubator-gluten/pull/5433#discussion_r1572009913


##########
gluten-ut/spark33/src/test/scala/org/apache/spark/sql/GlutenBloomFilterAggregateQuerySuite.scala:
##########
@@ -113,4 +113,37 @@ class GlutenBloomFilterAggregateQuerySuite
       }
     }
   }
+
+  testGluten("Test bloom_filter_agg fallback with might_contain offloaded") {
+    val table = "bloom_filter_test"
+    val numEstimatedItems = 5000000L
+    val numBits = GlutenConfig.getConf.veloxBloomFilterMaxNumBits
+    val sqlString = s"""
+                       |SELECT col positive_membership_test
+                       |FROM $table
+                       |WHERE might_contain(
+                       |            (SELECT bloom_filter_agg(col,
+                       |              cast($numEstimatedItems as long),
+                       |              cast($numBits as long))
+                       |             FROM $table), col)
+                      """.stripMargin
+
+    withTempView(table) {
+      (Seq(Long.MinValue, 0, Long.MaxValue) ++ (1L to 200000L))
+        .toDF("col")
+        .createOrReplaceTempView(table)
+      withSQLConf(
+        GlutenConfig.COLUMNAR_HASHAGG_ENABLED.key -> "false"

Review Comment:
   > If this is the only case that triggers bloom_filter_agg fallback?
   
   Probably there are still some cases making agg fallback, e.g., validation 
failures by other agg functions. Since the agg and might_contain are not in the 
same query/sub-query, plus taking AQE on/off and other 
validation/transformation rules into account, doing such co-fallback can be a 
very dirty work. Let's continue with the new approach introduced in 
https://github.com/apache/incubator-gluten/pull/5435 to let vanilla Spark be 
able to run Velox's bloom filter then we can thoroughly solve all the issues 
related to bloom filter mismatch including these fallback problems.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to