Re: [D] [Performance] Velox Bloom Filter Inefficiency vs. Photon at 1TB Scale [incubator-gluten]

via GitHub Tue, 03 Feb 2026 16:42:40 -0800


GitHub user zhli1142015 added a comment to the discussion: [Performance] Velox 
Bloom Filter Inefficiency vs. Photon at 1TB Scale


> Hi @shadowmmu Thanks for the detailed information — I see the issue now. This 
> appears to be a hard limit in the memory allocator, and modifying line may 
> help(https://github.com/facebookincubator/velox/blob/main/velox/common/memory/MemoryAllocator.h#L492)
>  Cc: @zhli1142015
> 
> To improve BHJ performance, Gluten has a WIP patch (#8931 ) that should 
> significantly improve performance for large BHJ workloads.
> 
> On the query planning side, DBX performs much better than vanilla Spark, 
> especially on the DS benchmark. There are also several useful optimizations 
> from the Spark community, but not been merged. Some of the cloud vendors also 
> improved on the spark catalyst in their product to improve the planner.
> 
> For the Aggregation perf diff, could you please also the the example queries?
> 
> thanks, -yuan

Internally, we’ve removed this limitation and are using Spark’s default value 
of 64 MB for bloomFilter.maxNumBits. @shadowmmu Is 1 GB the default setting for 
all queries? What is the overall impact of this?

GitHub link: 
https://github.com/apache/incubator-gluten/discussions/11554#discussioncomment-15688885

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [D] [Performance] Velox Bloom Filter Inefficiency vs. Photon at 1TB Scale [incubator-gluten]

Reply via email to