Luo Chen has uploaded a new change for review.

  https://asterix-gerrit.ics.uci.edu/2285

Change subject: [ASTERIXDB-2243][STO] Fix BloomFilter size estimation
......................................................................

[ASTERIXDB-2243][STO] Fix BloomFilter size estimation

- user model changes: no
- storage format changes: no
- interface changes: no

Details:
- Fix the bloom filter size estimation by using the
actual number of elements after bulk loading. This prevents
the bloom filter size grows larger and large under an update
heavy workloads, where most of ingested records are deleted
through merge.

Change-Id: Ib4054797d969efcfceb86f91b5321d34480e25c3
---
M 
hyracks-fullstack/hyracks/hyracks-storage-am-bloomfilter/src/main/java/org/apache/hyracks/storage/am/bloomfilter/impls/BloomFilter.java
1 file changed, 6 insertions(+), 4 deletions(-)


  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb 
refs/changes/85/2285/1

diff --git 
a/hyracks-fullstack/hyracks/hyracks-storage-am-bloomfilter/src/main/java/org/apache/hyracks/storage/am/bloomfilter/impls/BloomFilter.java
 
b/hyracks-fullstack/hyracks/hyracks-storage-am-bloomfilter/src/main/java/org/apache/hyracks/storage/am/bloomfilter/impls/BloomFilter.java
index 3d8782a..b603414 100644
--- 
a/hyracks-fullstack/hyracks/hyracks-storage-am-bloomfilter/src/main/java/org/apache/hyracks/storage/am/bloomfilter/impls/BloomFilter.java
+++ 
b/hyracks-fullstack/hyracks/hyracks-storage-am-bloomfilter/src/main/java/org/apache/hyracks/storage/am/bloomfilter/impls/BloomFilter.java
@@ -57,7 +57,7 @@
     private int numHashes;
     private long numElements;
     private long numBits;
-    // keep trace of the version of the bloomfilter to be backward compatible
+    // keep track of the version of the bloomfilter to be backward compatible
     private int version;
     private final int numBitsPerPage;
     private final int numBlocksPerPage;
@@ -281,6 +281,7 @@
         private final int numHashes;
         private final long numBits;
         private final int numPages;
+        private long actualNumElements;
         private final IFIFOPageQueue queue;
         private final ICachedPage[] pages;
         private ICachedPage metaDataPage = null;
@@ -298,6 +299,7 @@
                 throw 
HyracksDataException.create(ErrorCode.CANNOT_CREATE_BLOOM_FILTER_WITH_NUMBER_OF_PAGES,
 tmp);
             }
             numPages = (int) tmp;
+            actualNumElements = 0;
             pages = new ICachedPage[numPages];
             int currentPageId = 1;
             while (currentPageId <= numPages) {
@@ -327,7 +329,7 @@
             }
             metaDataPage.getBuffer().putInt(NUM_PAGES_OFFSET, numPages);
             metaDataPage.getBuffer().putInt(NUM_HASHES_USED_OFFSET, numHashes);
-            metaDataPage.getBuffer().putLong(NUM_ELEMENTS_OFFSET, numElements);
+            metaDataPage.getBuffer().putLong(NUM_ELEMENTS_OFFSET, 
actualNumElements);
             metaDataPage.getBuffer().putLong(NUM_BITS_OFFSET, numBits);
             metaDataPage.getBuffer().putInt(VERSION_OFFSET, 
BLOCKED_BLOOM_FILTER_VERSION);
         }
@@ -337,6 +339,7 @@
             if (numPages == 0) {
                 throw 
HyracksDataException.create(ErrorCode.CANNOT_ADD_TUPLES_TO_DUMMY_BLOOM_FILTER);
             }
+            actualNumElements++;
             MurmurHash128Bit.hash3_x64_128(tuple, keyFields, SEED, hashes);
 
             long hash = Math.abs(hashes[0] % numBits);
@@ -367,7 +370,7 @@
             bufferCache.finishQueue();
             BloomFilter.this.numBits = numBits;
             BloomFilter.this.numHashes = numHashes;
-            BloomFilter.this.numElements = numElements;
+            BloomFilter.this.numElements = actualNumElements;
             BloomFilter.this.numPages = numPages;
             BloomFilter.this.version = BLOCKED_BLOOM_FILTER_VERSION;
         }
@@ -383,6 +386,5 @@
                 bufferCache.returnPage(metaDataPage, false);
             }
         }
-
     }
 }

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/2285
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib4054797d969efcfceb86f91b5321d34480e25c3
Gerrit-PatchSet: 1
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Luo Chen <[email protected]>

Reply via email to