nsivabalan edited a comment on issue #976: [HUDI-106] Adding support for 
DynamicBloomFilter
URL: https://github.com/apache/incubator-hudi/pull/976#issuecomment-548883578
 
 
   > Left some comments... can we also add a test to test the "dynamic" nature 
of the filter. e,g having more entries should result in larger filter with same 
fp ratio.. And also how are you enforcing a maximum dynamic bloom filter size. 
Can you share data on how big the bloom filter would be, if you say wrote 1M 
keys at fpp ratio 10^-9
   
   Few questions/clarifications:
   - I guess you can't bound the size in dynamic bloom filter. Size will grow 
according to the number to entries added. Initialize number of entries passed 
will be used to set the min size. 
   - I am trying to find ways to test the FP ratio. Not sure how would you test 
that. 
   - I was able to verify that adding more entries to the filter than the 
initial size, increases the size of the bloom. 
   - Here are the sizes of dynamic bloom filter with error rate 10^-9 and 
initial number of entries as 10k
    Size of bloom with 100 entries = 71940 bytes ~= 71kb
    Size of bloom with 1000 entries = 71940 bytes ~= 71kb
    Size of bloom with 10000 entries = 71940 bytes ~= 71kb
    Size of bloom with 100000 entries = 719088 bytes ~= 720kb
    Size of bloom with 1000000 entries  = 7190568 bytes ~= 7.1 MB
   - Not sure if we really need to have a (unit) test to ensure that size grows 
when no of entries added increases. Only assertion we can do is to verify that 
the size is greater when compared to smaller no of entries are added. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to