GitHub user zhouyuan added a comment to the discussion: [Performance] Velox Bloom Filter Inefficiency vs. Photon at 1TB Scale
@shadowmmu Thanks for sharing the findings and analysis. Gluten actually allows to config this via: `spark.gluten.sql.columnar.backend.velox.bloomFilter.maxNumBits ` Looks like in your example run for Q17 with Gluten, Spark run time filter is not triggered. Have you also tried to lower the application side threshold? `spark.sql.optimizer.runtime.bloomFilter.applicationSideScanSizeThreshold = 0 ` Please also note by default DBX enabled the local caching feature which can significantly improve performance for subsequent queries in a power test run. It also collects runtime statistics, helping later queries generate better execution plans(their CBO is enabled by default) GitHub link: https://github.com/apache/incubator-gluten/discussions/11554#discussioncomment-15685927 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
