Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2324#discussion_r189647876
  
    --- Diff: 
datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomDataMapWriter.java
 ---
    @@ -86,12 +86,31 @@ public void onBlockletStart(int blockletId) {
       protected void resetBloomFilters() {
         indexBloomFilters.clear();
         List<CarbonColumn> indexColumns = getIndexColumns();
    +    int[] stats = calculateBloomStats();
         for (int i = 0; i < indexColumns.size(); i++) {
    -      indexBloomFilters.add(BloomFilter.create(Funnels.byteArrayFunnel(),
    -          bloomFilterSize, bloomFilterFpp));
    +      indexBloomFilters
    +          .add(new CarbonBloomFilter(stats[0], stats[1], Hash.MURMUR_HASH, 
compressBloom));
         }
       }
     
    +  /**
    +   * It calculates the bits size and number of hash functions to calculate 
bloom.
    +   */
    +  private int[] calculateBloomStats() {
    +    /*
    +     * n: how many items you expect to have in your filter
    +     * p: your acceptable false positive rate
    +     * Number of bits (m) = -n*ln(p) / (ln(2)^2)
    +     * Number of hashes(k) = m/n * ln(2)
    --- End diff --
    
    Can't as `k` is dependent on `m`


---

Reply via email to