Re: [D] [Performance] Velox Bloom Filter Inefficiency vs. Photon at 1TB Scale [incubator-gluten]

via GitHub Tue, 03 Feb 2026 09:35:23 -0800


GitHub user zhouyuan added a comment to the discussion: [Performance] Velox 
Bloom Filter Inefficiency vs. Photon at 1TB Scale


Hi @shadowmmu Thanks for the detailed information — I see the issue now. This 
appears to be a hard limit in the memory allocator, and modifying line may 
help(https://github.com/facebookincubator/velox/blob/main/velox/common/memory/MemoryAllocator.h#L492)
Cc: @zhli1142015 

To improve BHJ performance, Gluten has a WIP patch 
(https://github.com/apache/incubator-gluten/pull/8931
) that should significantly improve performance for large BHJ workloads.

On the query planning side, DBX performs much better than vanilla Spark, 
especially on the DS benchmark. There are also several useful optimizations 
from the Spark community, but not been merged. Some of the cloud vendors also 
improved on the spark catalyst in their product to improve the planner.

For the Aggregation perf diff, could you please also the the example queries? 

thanks, -yuan

GitHub link: 
https://github.com/apache/incubator-gluten/discussions/11554#discussioncomment-15686284

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [D] [Performance] Velox Bloom Filter Inefficiency vs. Photon at 1TB Scale [incubator-gluten]

Reply via email to