yabola commented on code in PR #1020:
URL: https://github.com/apache/parquet-mr/pull/1020#discussion_r1070591022


##########
parquet-column/src/test/java/org/apache/parquet/column/values/bloomfilter/TestBlockSplitBloomFilter.java:
##########
@@ -181,6 +182,60 @@ public void testBloomFilterNDVs(){
     assertTrue(bytes < 5 * 1024 * 1024);
   }
 
+  @Test
+  public void testMergeBloomFilter() throws IOException {
+    Random random = new Random();
+    int numBytes = BlockSplitBloomFilter.optimalNumOfBits(1024 * 1024, 0.01) / 
8;
+    BloomFilter otherBloomFilter = new BlockSplitBloomFilter(numBytes);
+    BloomFilter mergedBloomFilter = new BlockSplitBloomFilter(numBytes);
+
+    Set<String> originStrings = new HashSet<>();
+    Set<String> testStrings = new HashSet<>();
+    Set<Integer> testInts = new HashSet<>();
+    Set<Double> testDoubles = new HashSet<>();
+    Set<Float> testFloats = new HashSet<>();
+    for (int i = 0; i < 1024; i++) {
+
+      String originStrValue = RandomStringUtils.randomAlphabetic(1, 64);
+      originStrings.add(originStrValue);
+      
mergedBloomFilter.insertHash(otherBloomFilter.hash(Binary.fromString(originStrValue)));
+
+      String testString = RandomStringUtils.randomAlphabetic(1, 64);
+      testStrings.add(testString);
+      
otherBloomFilter.insertHash(otherBloomFilter.hash(Binary.fromString(testString)));
+
+      int testInt = random.nextInt();
+      testInts.add(testInt);
+      otherBloomFilter.insertHash(otherBloomFilter.hash(testInt));
+
+      double testDouble = random.nextDouble();
+      testDoubles.add(testDouble);
+      otherBloomFilter.insertHash(otherBloomFilter.hash(testDouble));
+
+      float testFloat = random.nextFloat();
+      testFloats.add(testFloat);
+      otherBloomFilter.insertHash(otherBloomFilter.hash(testFloat));
+    }
+    mergedBloomFilter.merge(otherBloomFilter);
+    for (String testString : originStrings) {
+      
assertTrue(mergedBloomFilter.findHash(mergedBloomFilter.hash(Binary.fromString(testString))));

Review Comment:
   If the BloomFilter to be merged is not empty, there is a small probability 
that the two BloomFilters will be inconsistent when judging whether there is a 
hash value (I added random value).
   
   But if the BloomFilter to be merged is empty in the beginning, the result 
from these two BloomFilter should be always the same.
   
   I add two different test case, I am not sure if I need to add some more.



##########
parquet-column/src/test/java/org/apache/parquet/column/values/bloomfilter/TestBlockSplitBloomFilter.java:
##########
@@ -181,6 +182,60 @@ public void testBloomFilterNDVs(){
     assertTrue(bytes < 5 * 1024 * 1024);
   }
 
+  @Test
+  public void testMergeBloomFilter() throws IOException {
+    Random random = new Random();
+    int numBytes = BlockSplitBloomFilter.optimalNumOfBits(1024 * 1024, 0.01) / 
8;
+    BloomFilter otherBloomFilter = new BlockSplitBloomFilter(numBytes);
+    BloomFilter mergedBloomFilter = new BlockSplitBloomFilter(numBytes);
+
+    Set<String> originStrings = new HashSet<>();
+    Set<String> testStrings = new HashSet<>();
+    Set<Integer> testInts = new HashSet<>();
+    Set<Double> testDoubles = new HashSet<>();
+    Set<Float> testFloats = new HashSet<>();
+    for (int i = 0; i < 1024; i++) {
+
+      String originStrValue = RandomStringUtils.randomAlphabetic(1, 64);
+      originStrings.add(originStrValue);
+      
mergedBloomFilter.insertHash(otherBloomFilter.hash(Binary.fromString(originStrValue)));
+
+      String testString = RandomStringUtils.randomAlphabetic(1, 64);
+      testStrings.add(testString);
+      
otherBloomFilter.insertHash(otherBloomFilter.hash(Binary.fromString(testString)));
+
+      int testInt = random.nextInt();
+      testInts.add(testInt);
+      otherBloomFilter.insertHash(otherBloomFilter.hash(testInt));
+
+      double testDouble = random.nextDouble();
+      testDoubles.add(testDouble);
+      otherBloomFilter.insertHash(otherBloomFilter.hash(testDouble));
+
+      float testFloat = random.nextFloat();
+      testFloats.add(testFloat);
+      otherBloomFilter.insertHash(otherBloomFilter.hash(testFloat));
+    }
+    mergedBloomFilter.merge(otherBloomFilter);
+    for (String testString : originStrings) {
+      
assertTrue(mergedBloomFilter.findHash(mergedBloomFilter.hash(Binary.fromString(testString))));

Review Comment:
   If the BloomFilter to be merged is not empty, there is a small probability 
that the two BloomFilters will be inconsistent when judging whether there is a 
hash value (I added random value).
   
   But if the BloomFilter to be merged is empty in the beginning, the result 
from these two BloomFilter should be always the same.
   
   I add two different test cases, I am not sure if I need to add some more.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to