This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new d72e02897617 [SPARK-53077][CORE][TESTS][FOLLOWUP] Reduce insertion count in SparkBloomFilterSuite d72e02897617 is described below commit d72e02897617e185e35ba8890c4ba763e68e950e Author: Ish Nagy <i...@ishnagy.eu> AuthorDate: Tue Aug 5 07:58:33 2025 -0700 [SPARK-53077][CORE][TESTS][FOLLOWUP] Reduce insertion count in SparkBloomFilterSuite ## reduce insertion count in SparkBloomFilterSuite to mitigate long running time ### What changes were proposed in this pull request? This change reduces the insertion count in the `SparkBloomFilterSuite` testsuite to the bare minimum that's necessary to demonstrate the int truncation bug in the V1 version of `BloomFilterImpl`. ### Why are the changes needed? https://github.com/apache/spark/pull/50933 introduced a new `SparkBloomFilterSuite` testsuite which increased the test running time of the common/sketch module from about 7s to a whopping 12minutes. This change is a workaround to decrease the test running time, until we can devise a way to then (and only then) trigger these long running tests when there are actual changes done in `common/sketch`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? the minimum insertion count was selected based on the following measurements with the V1 version of the `BloomFilterImpl`: ``` 100M testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.050257 %) [00m18s] T: ~9.6% testAccuracyEvenOdd: acceptableFpp(3.000000 %) < actualFpp (3.053887 %) [00m09s] T: ~9.3% 150M testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.080157 %) [00m28s] T: ~15.0% testAccuracyEvenOdd: acceptableFpp(3.000000 %) < actualFpp (3.079987 %) [00m15s] T: ~15.4% 200M testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.861257 %) [00m37s] T: ~19.8% testAccuracyEvenOdd: acceptableFpp(3.000000 %) < actualFpp (3.860424 %) [00m20s] T: ~20.6% 250M testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.676172 %) [00m47s] T: ~25.1% testAccuracyEvenOdd: acceptableFpp(3.000000 %) < actualFpp (3.675387 %) [00m25s] T: ~25.8% 300M testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.210548 %) [00m57s] T: ~30.5% testAccuracyEvenOdd: acceptableFpp(3.000000 %) < actualFpp (3.209847 %) [00m30s] T: ~30.1% 350M testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (5.377388 %) [01m07s] T: ~35.8% testAccuracyEvenOdd: acceptableFpp(3.000000 %) < actualFpp (5.377483 %) [00m36s] T: ~37.1% 400M testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (8.170380 %) [01m17s] T: ~41.2% testAccuracyEvenOdd: acceptableFpp(3.000000 %) < actualFpp (8.170716 %) [00m40s] T: ~41.2% 500M testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (15.392861 %) [01m36s] T: ~51.3% testAccuracyEvenOdd: acceptableFpp(3.000000 %) < actualFpp (15.391692 %) [00m50s] T: ~51.5% 1G testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (59.890330 %) [03m07s] T: 100.0% testAccuracyEvenOdd: acceptableFpp(3.000000 %) < actualFpp (59.888499 %) [01m37s] T: 100.0% ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #51845 from ishnagy/SPARK-53077_reenable_SparkBloomFilterSuite. Authored-by: Ish Nagy <i...@ishnagy.eu> Signed-off-by: Dongjoon Hyun <dongj...@apache.org> --- .../org/apache/spark/util/sketch/SparkBloomFilterSuite.java | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/common/sketch/src/test/java/org/apache/spark/util/sketch/SparkBloomFilterSuite.java b/common/sketch/src/test/java/org/apache/spark/util/sketch/SparkBloomFilterSuite.java index a7186853edfc..91f7423300aa 100644 --- a/common/sketch/src/test/java/org/apache/spark/util/sketch/SparkBloomFilterSuite.java +++ b/common/sketch/src/test/java/org/apache/spark/util/sketch/SparkBloomFilterSuite.java @@ -40,8 +40,7 @@ public class SparkBloomFilterSuite { // the implemented fpp limit is only approximating the hard boundary, // so we'll need an error threshold for the assertion - final double FPP_EVEN_ODD_ERROR_FACTOR = 0.10; - final double FPP_RANDOM_ERROR_FACTOR = 0.10; + final double FPP_ACCEPTABLE_ERROR_FACTOR = 0.10; final long ONE_GB = 1024L * 1024L * 1024L; final long REQUIRED_HEAP_UPPER_BOUND_IN_BYTES = 4 * ONE_GB; @@ -106,7 +105,7 @@ public class SparkBloomFilterSuite { // to reduce running time to acceptable levels, we test only one case, // with the default FPP and the default seed only. return Stream.of( - Arguments.of(1_000_000_000L, 0.03, BloomFilterImplV2.DEFAULT_SEED) + Arguments.of(350_000_000L, 0.03, BloomFilterImplV2.DEFAULT_SEED) ); // preferable minimum parameter space for tests: // {1_000_000L, 1_000_000_000L} for: long numItems @@ -201,7 +200,7 @@ public class SparkBloomFilterSuite { ); double actualFpp = mightContainOdd.doubleValue() / numItems; - double acceptableFpp = expectedFpp * (1 + FPP_EVEN_ODD_ERROR_FACTOR); + double acceptableFpp = expectedFpp * (1 + FPP_ACCEPTABLE_ERROR_FACTOR); testOut.printf("expectedFpp: %f %%\n", 100 * expectedFpp); testOut.printf("acceptableFpp: %f %%\n", 100 * acceptableFpp); @@ -279,6 +278,7 @@ public class SparkBloomFilterSuite { deterministicSeed ); + // V1 ignores custom seed values, so the control filter must be at least V2 BloomFilter bloomFilterSecondary = BloomFilter.create( BloomFilter.Version.V2, @@ -354,7 +354,7 @@ public class SparkBloomFilterSuite { double actualFpp = mightContainOddIndexed.doubleValue() / confirmedAsNotInserted.doubleValue(); - double acceptableFpp = expectedFpp * (1 + FPP_RANDOM_ERROR_FACTOR); + double acceptableFpp = expectedFpp * (1 + FPP_ACCEPTABLE_ERROR_FACTOR); testOut.printf("mightContainOddIndexed: %10d\n", mightContainOddIndexed.longValue()); testOut.printf("confirmedAsNotInserted: %10d\n", confirmedAsNotInserted.longValue()); --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org