This is an automated email from the ASF dual-hosted git repository.
yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 8fa9a79b1b5a [SPARK-55112][SQL][TESTS] Add micro-benchmark for
`o.a.spark.util.sketch.BloomFilter`
8fa9a79b1b5a is described below
commit 8fa9a79b1b5a70a8665ed40864e0e48d2ca69b60
Author: yangjie01 <[email protected]>
AuthorDate: Thu Jan 22 12:38:02 2026 +0800
[SPARK-55112][SQL][TESTS] Add micro-benchmark for
`o.a.spark.util.sketch.BloomFilter`
### What changes were proposed in this pull request?
This PR aims to add a micro-benchmark for
`o.a.spark.util.sketch.BloomFilter`
### Why are the changes needed?
To facilitate subsequent tracking and optimization of the `BloomFilter`'s
performance.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass Github Acitons
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #53882 from LuciferYang/sparkbf-bench.
Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: LuciferYang <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
---
.../SparkBloomFilterBenchmark-jdk21-results.txt | 300 +++++++++++++++++++
.../SparkBloomFilterBenchmark-results.txt | 300 +++++++++++++++++++
.../spark/sql/SparkBloomFilterBenchmark.scala | 318 +++++++++++++++++++++
3 files changed, 918 insertions(+)
diff --git
a/sql/catalyst/benchmarks/SparkBloomFilterBenchmark-jdk21-results.txt
b/sql/catalyst/benchmarks/SparkBloomFilterBenchmark-jdk21-results.txt
new file mode 100644
index 000000000000..1a63ad957e8c
--- /dev/null
+++ b/sql/catalyst/benchmarks/SparkBloomFilterBenchmark-jdk21-results.txt
@@ -0,0 +1,300 @@
+================================================================================================
+Put Operation - Small Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Put Operation - 10000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 10000 0 0
0 24.3 41.2 1.0X
+BloomFilterImplV2 - 10000 0 0
0 21.6 46.3 0.9X
+
+
+================================================================================================
+Put Operation - Medium Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Put Operation - 100000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 5 5
0 20.2 49.5 1.0X
+BloomFilterImplV2 - 100000 5 5
0 19.4 51.7 1.0X
+
+
+================================================================================================
+Put Operation - Large Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Put Operation - 1000000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 1000000 55 55
0 18.3 54.5 1.0X
+BloomFilterImplV2 - 1000000 58 58
0 17.4 57.6 0.9X
+
+
+================================================================================================
+MightContain Operation - Small Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 50.0%) - 10000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 10000
0 0 0 39.0 25.6 1.0X
+BloomFilterImplV2 - 10000
0 0 0 48.5 20.6 1.2X
+
+
+================================================================================================
+MightContain Operation - Medium Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
3 3 0 37.0 27.0 1.0X
+BloomFilterImplV2 - 100000
3 3 0 31.5 31.7 0.9X
+
+
+================================================================================================
+MightContain Operation - Large Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 50.0%) - 1000000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 1000000
32 32 0 31.5 31.7 1.0X
+BloomFilterImplV2 - 1000000
37 37 0 27.0 37.0 0.9X
+
+
+================================================================================================
+FPP Impact on Put Operations
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Put Operation - 100000 items, FPP: 0.01: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 7 7
0 15.4 65.1 1.0X
+BloomFilterImplV2 - 100000 7 7
0 14.5 69.2 0.9X
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Put Operation - 100000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 5 5
0 20.2 49.4 1.0X
+BloomFilterImplV2 - 100000 5 5
0 19.4 51.6 1.0X
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Put Operation - 100000 items, FPP: 0.05: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 4 4
0 24.0 41.6 1.0X
+BloomFilterImplV2 - 100000 4 4
0 23.3 43.0 1.0X
+
+
+================================================================================================
+FPP Impact on Query Operations
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.01: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
3 3 0 33.1 30.3 1.0X
+BloomFilterImplV2 - 100000
4 4 0 28.1 35.6 0.9X
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
3 3 0 37.0 27.0 1.0X
+BloomFilterImplV2 - 100000
3 3 0 31.4 31.9 0.8X
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.05: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
3 3 0 39.9 25.1 1.0X
+BloomFilterImplV2 - 100000
3 3 0 33.6 29.8 0.8X
+
+
+================================================================================================
+Hit Rate Impact Analysis
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 10.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
3 3 0 32.0 31.2 1.0X
+BloomFilterImplV2 - 100000
4 4 0 27.1 36.9 0.8X
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
3 3 0 37.0 27.0 1.0X
+BloomFilterImplV2 - 100000
3 3 0 31.5 31.8 0.9X
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 90.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
2 2 0 44.1 22.7 1.0X
+BloomFilterImplV2 - 100000
3 3 0 37.6 26.6 0.9X
+
+
+================================================================================================
+Binary Put Operation - Small Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary PUT Operation - 10000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-----------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 10000 1 1
0 11.3 88.3 1.0X
+BloomFilterImplV2 - 10000 1 1
0 9.6 104.0 0.8X
+
+
+================================================================================================
+Binary Put Operation - Medium Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary PUT Operation - 100000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 9 9
0 10.7 93.3 1.0X
+BloomFilterImplV2 - 100000 10 10
0 9.8 102.0 0.9X
+
+
+================================================================================================
+Binary Put Operation - Large Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary PUT Operation - 1000000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 1000000 102 102
0 9.8 101.8 1.0X
+BloomFilterImplV2 - 1000000 117 117
0 8.5 117.0 0.9X
+
+
+================================================================================================
+Binary Query Operation - Small Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 50.0%) - 10000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 10000
1 1 0 15.9 63.0 1.0X
+BloomFilterImplV2 - 10000
1 1 0 16.1 62.2 1.0X
+
+
+================================================================================================
+Binary Query Operation - Medium Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
7 7 0 14.9 66.9 1.0X
+BloomFilterImplV2 - 100000
7 7 0 14.9 67.1 1.0X
+
+
+================================================================================================
+Binary Query Operation - Large Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 50.0%) - 1000000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 1000000
72 72 0 13.8 72.2 1.0X
+BloomFilterImplV2 - 1000000
75 76 1 13.3 74.9 1.0X
+
+
+================================================================================================
+FPP Impact on Binary Put Operations
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary PUT Operation - 100000 items, FPP: 0.01: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 11 11
0 9.1 109.5 1.0X
+BloomFilterImplV2 - 100000 13 13
0 8.0 125.4 0.9X
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary PUT Operation - 100000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 9 9
0 10.7 93.3 1.0X
+BloomFilterImplV2 - 100000 10 10
0 9.8 101.9 0.9X
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary PUT Operation - 100000 items, FPP: 0.05: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 8 8
0 12.1 82.6 1.0X
+BloomFilterImplV2 - 100000 9 9
0 11.0 91.3 0.9X
+
+
+================================================================================================
+FPP Impact on Binary Query Operations
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.01: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
7 7 0 14.2 70.2 1.0X
+BloomFilterImplV2 - 100000
7 7 0 13.9 71.9 1.0X
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
7 7 0 14.9 67.2 1.0X
+BloomFilterImplV2 - 100000
7 7 0 14.9 67.3 1.0X
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.05: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
6 7 0 15.4 64.9 1.0X
+BloomFilterImplV2 - 100000
6 6 0 15.5 64.6 1.0X
+
+
+================================================================================================
+Hit Rate Impact on Binary Operations
+================================================================================================
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 10.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
7 7 0 14.7 67.9 1.0X
+BloomFilterImplV2 - 100000
7 7 0 14.6 68.3 1.0X
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
7 7 0 14.9 67.1 1.0X
+BloomFilterImplV2 - 100000
7 7 0 14.9 67.1 1.0X
+
+OpenJDK 64-Bit Server VM 21.0.10+7-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 90.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
7 7 0 15.1 66.3 1.0X
+BloomFilterImplV2 - 100000
7 7 0 15.2 65.9 1.0X
+
+
diff --git a/sql/catalyst/benchmarks/SparkBloomFilterBenchmark-results.txt
b/sql/catalyst/benchmarks/SparkBloomFilterBenchmark-results.txt
new file mode 100644
index 000000000000..d709d2b5e4fa
--- /dev/null
+++ b/sql/catalyst/benchmarks/SparkBloomFilterBenchmark-results.txt
@@ -0,0 +1,300 @@
+================================================================================================
+Put Operation - Small Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Put Operation - 10000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 10000 0 0
0 21.0 47.7 1.0X
+BloomFilterImplV2 - 10000 1 1
0 19.0 52.7 0.9X
+
+
+================================================================================================
+Put Operation - Medium Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Put Operation - 100000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 5 5
0 20.4 49.1 1.0X
+BloomFilterImplV2 - 100000 6 6
0 17.6 56.9 0.9X
+
+
+================================================================================================
+Put Operation - Large Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Put Operation - 1000000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 1000000 54 55
0 18.4 54.5 1.0X
+BloomFilterImplV2 - 1000000 63 63
0 15.9 62.9 0.9X
+
+
+================================================================================================
+MightContain Operation - Small Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 50.0%) - 10000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 10000
0 0 0 35.4 28.2 1.0X
+BloomFilterImplV2 - 10000
0 0 0 33.1 30.2 0.9X
+
+
+================================================================================================
+MightContain Operation - Medium Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
3 3 0 32.4 30.8 1.0X
+BloomFilterImplV2 - 100000
3 3 0 29.6 33.7 0.9X
+
+
+================================================================================================
+MightContain Operation - Large Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 50.0%) - 1000000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 1000000
36 36 0 28.1 35.6 1.0X
+BloomFilterImplV2 - 1000000
39 39 0 25.6 39.0 0.9X
+
+
+================================================================================================
+FPP Impact on Put Operations
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Put Operation - 100000 items, FPP: 0.01: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 6 6
0 15.8 63.5 1.0X
+BloomFilterImplV2 - 100000 7 7
0 13.5 74.1 0.9X
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Put Operation - 100000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 5 5
0 20.3 49.2 1.0X
+BloomFilterImplV2 - 100000 6 6
0 17.5 57.0 0.9X
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Put Operation - 100000 items, FPP: 0.05: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 4 4
0 24.1 41.5 1.0X
+BloomFilterImplV2 - 100000 5 5
0 20.8 48.1 0.9X
+
+
+================================================================================================
+FPP Impact on Query Operations
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.01: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
3 3 0 29.7 33.6 1.0X
+BloomFilterImplV2 - 100000
4 4 0 26.7 37.4 0.9X
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
3 3 0 32.4 30.9 1.0X
+BloomFilterImplV2 - 100000
3 3 0 29.7 33.7 0.9X
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.05: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
3 3 0 35.0 28.6 1.0X
+BloomFilterImplV2 - 100000
3 3 0 31.7 31.6 0.9X
+
+
+================================================================================================
+Hit Rate Impact Analysis
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 10.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
4 4 0 28.5 35.1 1.0X
+BloomFilterImplV2 - 100000
4 4 0 25.7 38.9 0.9X
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
3 3 0 32.4 30.9 1.0X
+BloomFilterImplV2 - 100000
3 3 0 29.7 33.7 0.9X
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+MightContain Operation (Hit Rate: 90.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
3 3 0 37.7 26.5 1.0X
+BloomFilterImplV2 - 100000
3 3 0 35.0 28.6 0.9X
+
+
+================================================================================================
+Binary Put Operation - Small Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary PUT Operation - 10000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-----------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 10000 1 1
0 10.6 94.6 1.0X
+BloomFilterImplV2 - 10000 1 1
0 9.6 104.2 0.9X
+
+
+================================================================================================
+Binary Put Operation - Medium Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary PUT Operation - 100000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 10 10
0 10.0 99.5 1.0X
+BloomFilterImplV2 - 100000 11 11
0 8.8 113.5 0.9X
+
+
+================================================================================================
+Binary Put Operation - Large Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary PUT Operation - 1000000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 1000000 109 109
0 9.2 109.2 1.0X
+BloomFilterImplV2 - 1000000 125 125
0 8.0 124.8 0.9X
+
+
+================================================================================================
+Binary Query Operation - Small Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 50.0%) - 10000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 10000
1 1 0 14.7 68.1 1.0X
+BloomFilterImplV2 - 10000
1 1 0 13.7 72.9 0.9X
+
+
+================================================================================================
+Binary Query Operation - Medium Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
7 7 0 13.6 73.7 1.0X
+BloomFilterImplV2 - 100000
8 8 0 12.8 78.1 0.9X
+
+
+================================================================================================
+Binary Query Operation - Large Scale
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 50.0%) - 1000000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 1000000
83 83 0 12.0 83.0 1.0X
+BloomFilterImplV2 - 1000000
88 88 0 11.4 87.5 0.9X
+
+
+================================================================================================
+FPP Impact on Binary Put Operations
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary PUT Operation - 100000 items, FPP: 0.01: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 11 11
0 8.9 112.4 1.0X
+BloomFilterImplV2 - 100000 13 13
0 7.5 134.0 0.8X
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary PUT Operation - 100000 items, FPP: 0.03: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 10 10
0 10.0 99.6 1.0X
+BloomFilterImplV2 - 100000 11 11
0 8.8 113.7 0.9X
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary PUT Operation - 100000 items, FPP: 0.05: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000 9 9
0 10.9 91.8 1.0X
+BloomFilterImplV2 - 100000 10 10
0 9.9 101.0 0.9X
+
+
+================================================================================================
+FPP Impact on Binary Query Operations
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.01: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
8 8 0 12.7 78.8 1.0X
+BloomFilterImplV2 - 100000
8 8 0 11.8 84.5 0.9X
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
7 7 0 13.6 73.5 1.0X
+BloomFilterImplV2 - 100000
8 8 0 12.8 78.1 0.9X
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.05: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
7 7 0 14.0 71.2 1.0X
+BloomFilterImplV2 - 100000
7 7 0 13.7 73.2 1.0X
+
+
+================================================================================================
+Hit Rate Impact on Binary Operations
+================================================================================================
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 10.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
7 7 0 13.5 73.9 1.0X
+BloomFilterImplV2 - 100000
8 8 0 13.0 76.6 1.0X
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 50.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
7 7 0 13.6 73.7 1.0X
+BloomFilterImplV2 - 100000
8 8 0 12.8 78.0 0.9X
+
+OpenJDK 64-Bit Server VM 17.0.18+8-LTS on Linux 6.11.0-1018-azure
+AMD EPYC 7763 64-Core Processor
+Binary Query Operation (Hit Rate: 90.0%) - 100000 items, FPP: 0.03: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------------------------------
+BloomFilterImpl V1 - 100000
7 8 0 13.6 73.5 1.0X
+BloomFilterImplV2 - 100000
8 8 0 12.6 79.6 0.9X
+
+
diff --git
a/sql/catalyst/src/test/scala/org/apache/spark/sql/SparkBloomFilterBenchmark.scala
b/sql/catalyst/src/test/scala/org/apache/spark/sql/SparkBloomFilterBenchmark.scala
new file mode 100644
index 000000000000..0e299bc981f0
--- /dev/null
+++
b/sql/catalyst/src/test/scala/org/apache/spark/sql/SparkBloomFilterBenchmark.scala
@@ -0,0 +1,318 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.util.sketch.BloomFilter._
+import org.apache.spark.util.sketch.BloomFilter.Version._
+
+/**
+ * Benchmark for Spark's BloomFilter implementations (BloomFilterImpl and
BloomFilterImplV2)
+ *
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> <spark catalyst test jar>
+ * 2. build/sbt "catalyst/Test/runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "catalyst/Test/runMain
<this class>"
+ * Results will be written to
"benchmarks/SparkBloomFilterBenchmark-results.txt".
+ * }}}
+ */
+object SparkBloomFilterBenchmark extends BenchmarkBase {
+
+ private val DEFAULT_SEED = 0
+ private val SMALL_ITEMS = 10000
+ private val MEDIUM_ITEMS = 100000
+ private val LARGE_ITEMS = 1000000
+
+ /**
+ * Tests PUT operation performance with different data sizes.
+ */
+ private def benchmarkPutOperation(
+ numItems: Int,
+ valuesPerIteration: Int,
+ fpp: Double = 0.03): Unit = {
+ val benchmark = new Benchmark(
+ s"Put Operation - $numItems items, FPP: $fpp",
+ valuesPerIteration,
+ output = output)
+
+ // Test BloomFilterImpl (V1)
+ benchmark.addCase(s"BloomFilterImpl V1 - $numItems", 3) { _ =>
+ val bf = create(V1, numItems, optimalNumOfBits(numItems, fpp),
DEFAULT_SEED)
+ var i = 0
+ while (i < valuesPerIteration) {
+ bf.put(i.toLong)
+ i += 1
+ }
+ }
+
+ // Test BloomFilterImplV2 (V2)
+ benchmark.addCase(s"BloomFilterImplV2 - $numItems", 3) { _ =>
+ val bf = create(V2, numItems, optimalNumOfBits(numItems, fpp),
DEFAULT_SEED)
+ var i = 0
+ while (i < valuesPerIteration) {
+ bf.put(i.toLong)
+ i += 1
+ }
+ }
+
+ benchmark.run()
+ }
+
+ /**
+ * Tests query operation performance with different hit rates and data sizes
+ */
+ private def benchmarkMightContainOperation(
+ numItems: Int,
+ valuesPerIteration: Int,
+ fpp: Double = 0.03,
+ hitRate: Double = 0.5): Unit = {
+ val benchmark = new Benchmark(
+ s"MightContain Operation (Hit Rate: ${hitRate * 100}%) - $numItems
items, FPP: $fpp",
+ valuesPerIteration,
+ output = output)
+
+ // Prepare BloomFilter with existing items
+ val existingItems = (0 until numItems).toArray
+ val testItems = (0 until valuesPerIteration).map { i =>
+ if (i < (valuesPerIteration * hitRate).toInt) {
+ // Existing items (hits)
+ existingItems(i % numItems).toLong
+ } else {
+ // New items (potential misses)
+ (numItems + i).toLong
+ }
+ }.toArray
+
+ // Test BloomFilterImpl (V1)
+ benchmark.addTimerCase(s"BloomFilterImpl V1 - $numItems", 3) { timer =>
+ val bf = create(V1, numItems, optimalNumOfBits(numItems, fpp),
DEFAULT_SEED)
+ // Populate with existing items
+ existingItems.foreach(i => bf.put(i.toLong))
+
+ timer.startTiming()
+ var i = 0
+ while (i < valuesPerIteration) {
+ bf.mightContain(testItems(i))
+ i += 1
+ }
+ timer.stopTiming()
+ }
+
+ // Test BloomFilterImplV2 (V2)
+ benchmark.addTimerCase(s"BloomFilterImplV2 - $numItems", 3) { timer =>
+ val bf = create(V2, numItems, optimalNumOfBits(numItems, fpp),
DEFAULT_SEED)
+ // Populate with existing items
+ existingItems.foreach(i => bf.put(i.toLong))
+
+ timer.startTiming()
+ var i = 0
+ while (i < valuesPerIteration) {
+ bf.mightContain(testItems(i))
+ i += 1
+ }
+ timer.stopTiming()
+ }
+
+ benchmark.run()
+ }
+
+ private def benchmarkBinaryPutOperation(
+ numItems: Int,
+ valuesPerIteration: Int,
+ fpp: Double = 0.03): Unit = {
+ val benchmark = new Benchmark(
+ s"Binary PUT Operation - $numItems items, FPP: $fpp",
+ valuesPerIteration,
+ output = output)
+
+ // Prepare binary data - simple UTF-8 conversion like putString
implementation
+ val binaryData = (0 until numItems).map { i =>
+
s"item_${i}_test_binary_data_for_put_${System.currentTimeMillis()}".getBytes("UTF-8")
+ }.toArray
+
+ // Test V1 with binary PUT
+ benchmark.addCase(s"BloomFilterImpl V1 - $numItems", 3) { _ =>
+ val bf = create(V1, numItems, optimalNumOfBits(numItems, fpp),
DEFAULT_SEED)
+ var i = 0
+ while (i < valuesPerIteration) {
+ bf.putBinary(binaryData(i % numItems))
+ i += 1
+ }
+ }
+
+ // Test V2 with binary PUT
+ benchmark.addCase(s"BloomFilterImplV2 - $numItems", 3) { _ =>
+ val bf = create(V2, numItems, optimalNumOfBits(numItems, fpp),
DEFAULT_SEED)
+ var i = 0
+ while (i < valuesPerIteration) {
+ bf.putBinary(binaryData(i % numItems))
+ i += 1
+ }
+ }
+
+ benchmark.run()
+ }
+
+ private def benchmarkBinaryMightContainOperation(
+ numItems: Int,
+ valuesPerIteration: Int,
+ fpp: Double = 0.03,
+ hitRate: Double = 0.5): Unit = {
+ val benchmark = new Benchmark(
+ s"Binary Query Operation (Hit Rate: ${hitRate*100}%) - $numItems items,
FPP: $fpp",
+ valuesPerIteration,
+ output = output)
+
+ // Prepare binary data for existing items
+ val binaryData = (0 until numItems).map { i =>
+
s"item_${i}_test_binary_data_for_query_${System.currentTimeMillis()}".getBytes("UTF-8")
+ }.toArray
+
+ // Prepare query data with specified hit rate
+ val queryBinaryData = (0 until valuesPerIteration).map { i =>
+ if (i < (valuesPerIteration * hitRate).toInt) {
+ binaryData(i % numItems) // Existing data (hit)
+ } else {
+ // New binary data (likely miss)
+
s"new_item_${i}_not_in_filter_${System.currentTimeMillis()}".getBytes("UTF-8")
+ }
+ }.toArray
+
+ // Test V1 with binary QUERY
+ benchmark.addTimerCase(s"BloomFilterImpl V1 - $numItems", 3) { timer =>
+ val bf = create(V1, numItems, optimalNumOfBits(numItems, fpp),
DEFAULT_SEED)
+ // Populate with binary data
+ binaryData.foreach(data => bf.putBinary(data))
+
+ timer.startTiming()
+ var i = 0
+ while (i < valuesPerIteration) {
+ bf.mightContainBinary(queryBinaryData(i))
+ i += 1
+ }
+ timer.stopTiming()
+ }
+
+ // Test V2 with binary QUERY
+ benchmark.addTimerCase(s"BloomFilterImplV2 - $numItems", 3) { timer =>
+ val bf = create(V2, numItems, optimalNumOfBits(numItems, fpp),
DEFAULT_SEED)
+ // Populate with binary data
+ binaryData.foreach(data => bf.putBinary(data))
+
+ timer.startTiming()
+ var i = 0
+ while (i < valuesPerIteration) {
+ bf.mightContainBinary(queryBinaryData(i))
+ i += 1
+ }
+ timer.stopTiming()
+ }
+
+ benchmark.run()
+ }
+
+ override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+
+ runBenchmark("Put Operation - Small Scale") {
+ benchmarkPutOperation(SMALL_ITEMS, 10000)
+ }
+
+ runBenchmark("Put Operation - Medium Scale") {
+ benchmarkPutOperation(MEDIUM_ITEMS, 100000)
+ }
+
+ runBenchmark("Put Operation - Large Scale") {
+ benchmarkPutOperation(LARGE_ITEMS, 1000000)
+ }
+
+ runBenchmark("MightContain Operation - Small Scale") {
+ benchmarkMightContainOperation(SMALL_ITEMS, 10000, 0.03, 0.5)
+ }
+
+ runBenchmark("MightContain Operation - Medium Scale") {
+ benchmarkMightContainOperation(MEDIUM_ITEMS, 100000, 0.03, 0.5)
+ }
+
+ runBenchmark("MightContain Operation - Large Scale") {
+ benchmarkMightContainOperation(LARGE_ITEMS, 1000000, 0.03, 0.5)
+ }
+
+ runBenchmark("FPP Impact on Put Operations") {
+ benchmarkPutOperation(MEDIUM_ITEMS, 100000, 0.01) // Low FPP
+ benchmarkPutOperation(MEDIUM_ITEMS, 100000, 0.03) // Default FPP
+ benchmarkPutOperation(MEDIUM_ITEMS, 100000, 0.05) // High FPP
+ }
+
+ runBenchmark("FPP Impact on Query Operations") {
+ benchmarkMightContainOperation(MEDIUM_ITEMS, 100000, 0.01, 0.5) // Low
FPP, 50% hit rate
+ benchmarkMightContainOperation(MEDIUM_ITEMS, 100000, 0.03, 0.5) //
Default FPP, 50% hit rate
+ benchmarkMightContainOperation(MEDIUM_ITEMS, 100000, 0.05, 0.5) // High
FPP, 50% hit rate
+ }
+
+ runBenchmark("Hit Rate Impact Analysis") {
+ benchmarkMightContainOperation(MEDIUM_ITEMS, 100000, 0.03, 0.1) // 10%
hit rate
+ benchmarkMightContainOperation(MEDIUM_ITEMS, 100000, 0.03, 0.5) // 50%
hit rate
+ benchmarkMightContainOperation(MEDIUM_ITEMS, 100000, 0.03, 0.9) // 90%
hit rate
+ }
+
+ runBenchmark("Binary Put Operation - Small Scale") {
+ benchmarkBinaryPutOperation(SMALL_ITEMS, 10000)
+ }
+
+ runBenchmark("Binary Put Operation - Medium Scale") {
+ benchmarkBinaryPutOperation(MEDIUM_ITEMS, 100000)
+ }
+
+ runBenchmark("Binary Put Operation - Large Scale") {
+ benchmarkBinaryPutOperation(LARGE_ITEMS, 1000000)
+ }
+
+ runBenchmark("Binary Query Operation - Small Scale") {
+ benchmarkBinaryMightContainOperation(SMALL_ITEMS, 10000, 0.03, 0.5)
+ }
+
+ runBenchmark("Binary Query Operation - Medium Scale") {
+ benchmarkBinaryMightContainOperation(MEDIUM_ITEMS, 100000, 0.03, 0.5)
+ }
+
+ runBenchmark("Binary Query Operation - Large Scale") {
+ benchmarkBinaryMightContainOperation(LARGE_ITEMS, 1000000, 0.03, 0.5)
+ }
+
+ runBenchmark("FPP Impact on Binary Put Operations") {
+ benchmarkBinaryPutOperation(MEDIUM_ITEMS, 100000, 0.01) // Low FPP
+ benchmarkBinaryPutOperation(MEDIUM_ITEMS, 100000, 0.03) // Default FPP
+ benchmarkBinaryPutOperation(MEDIUM_ITEMS, 100000, 0.05) // High FPP
+ }
+
+ runBenchmark("FPP Impact on Binary Query Operations") {
+ benchmarkBinaryMightContainOperation(MEDIUM_ITEMS, 100000, 0.01, 0.5)
// Low FPP
+ benchmarkBinaryMightContainOperation(MEDIUM_ITEMS, 100000, 0.03, 0.5)
// Default FPP
+ benchmarkBinaryMightContainOperation(MEDIUM_ITEMS, 100000, 0.05, 0.5)
// High FPP
+ }
+
+ runBenchmark("Hit Rate Impact on Binary Operations") {
+ benchmarkBinaryMightContainOperation(MEDIUM_ITEMS, 100000, 0.03, 0.1)
// 10% hit rate
+ benchmarkBinaryMightContainOperation(MEDIUM_ITEMS, 100000, 0.03, 0.5)
// 50% hit rate
+ benchmarkBinaryMightContainOperation(MEDIUM_ITEMS, 100000, 0.03, 0.9)
// 90% hit rate
+ }
+ }
+}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]