brijrajk commented on PR #12151:
URL: https://github.com/apache/gluten/pull/12151#issuecomment-4683765567

   Thanks for flagging this, @philo-he!
   
   Both of Copilot's comments were valid:
   
   **1. Patcher active when native bloom filter is disabled**
   
   When `spark.gluten.sql.native.bloomFilter=false`, Stage 0 falls back to 
Spark and produces Spark-format bytes. The joint-fallback rule still wraps 
Stage 1 in a `FallbackNode`, so the patcher was incorrectly rewriting it to 
`VeloxBloomFilterMightContain` — which would cause the same `IOException` the 
patcher was introduced to fix, just from the opposite trigger.
   
   Added a second guard: `if (!GlutenConfig.get.enableNativeBloomFilter) return 
plan`. This mirrors the existing guard already in 
`BloomFilterMightContainJointRewriteRule`.
   
   **2. `df.collect` + `df.count()` runs the query twice**
   
   Combined into `assert(df.collect().length == 200003L)` — single execution, 
same failure signal if the `IOException` is thrown.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to