brijrajk commented on PR #12151: URL: https://github.com/apache/gluten/pull/12151#issuecomment-4683765567
Thanks for flagging this, @philo-he! Both of Copilot's comments were valid: **1. Patcher active when native bloom filter is disabled** When `spark.gluten.sql.native.bloomFilter=false`, Stage 0 falls back to Spark and produces Spark-format bytes. The joint-fallback rule still wraps Stage 1 in a `FallbackNode`, so the patcher was incorrectly rewriting it to `VeloxBloomFilterMightContain` — which would cause the same `IOException` the patcher was introduced to fix, just from the opposite trigger. Added a second guard: `if (!GlutenConfig.get.enableNativeBloomFilter) return plan`. This mirrors the existing guard already in `BloomFilterMightContainJointRewriteRule`. **2. `df.collect` + `df.count()` runs the query twice** Combined into `assert(df.collect().length == 200003L)` — single execution, same failure signal if the `IOException` is thrown. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
