Re: [D] [Performance] Velox Bloom Filter Inefficiency vs. Photon at 1TB Scale [incubator-gluten]

via GitHub Tue, 03 Feb 2026 08:48:50 -0800


GitHub user zhouyuan added a comment to the discussion: [Performance] Velox 
Bloom Filter Inefficiency vs. Photon at 1TB Scale


@shadowmmu Thanks for sharing the findings and analysis.  
Gluten actually allows to config this via:
`spark.gluten.sql.columnar.backend.velox.bloomFilter.maxNumBits `

Looks like in your example run for Q17 with Gluten, Spark run time filter is 
not triggered. Have you also tried to lower the application side threshold?  

`spark.sql.optimizer.runtime.bloomFilter.applicationSideScanSizeThreshold = 0 `

Please also note by default DBX enabled the local caching feature which can 
significantly improve performance for subsequent queries in a power test run. 
It also collects runtime statistics, helping later queries generate better 
execution plans(their CBO is enabled by default)

GitHub link: 
https://github.com/apache/incubator-gluten/discussions/11554#discussioncomment-15685927

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [D] [Performance] Velox Bloom Filter Inefficiency vs. Photon at 1TB Scale [incubator-gluten]

Reply via email to