[GitHub] [commons-collections] Claudenw commented on pull request #402: COLLECTIONS-843: Implement Layered Bloom filter

via GitHub Thu, 20 Jul 2023 22:51:15 -0700


Claudenw commented on PR #402:
URL: 
https://github.com/apache/commons-collections/pull/402#issuecomment-1645018264


   Recent changes:
   # Main code
   ## BloomFilterProducer
   
   * Cleaned up documentation to specify that implementations should specify if 
filters used in `forEachBloomFilterPair()` and `asBloomFilterArray` are copies 
or references.
   * Layer and depth order discussion moved to `LayerManager`.
   * `asBloomFilterArray` coding change accepted, now uses List internally and 
returns array.
   * `forEachBloomFilterPair` documentation corrected.
   * added `flatten()`
   
   ## CountingPredicate
   
   * documentation updated
   
   ## LayerManager
   
   * documentation updated.
   * `ExtendCheck` and `Cleanup` classes changed to have only static methods, 
no public static variables.
   * added parameter checks for `ExtendCheck.advanceOnCount#breakAt`, 
`ExtendCheck.advanceOnSaturation#maxN`, and `Cleanup.onMaxSize#maxSize`
   * removed the ``ExtendCheck.advanceOnSaturation(int)` method
   * LinkedList is required as the Bloom filter storage implementation (and 
`cleanup` argument) as we need rapid access to the intermediate layers in order 
to return the specific layers.
   * added method `addFilter()` to verify that the `filterSupplier` did not 
return `null`.
   * optimised `copy()` so that no filters are created during the construction
   * moved the check for empty filter in `next()` to be part of the cleanup 
consumer.
   * the `get()` method needs to be public so that access to the specific layer 
is permitted as required by the `LayeredBloomFilter`.
   * `target()` renamed to `getTarget()`
   * `clear()` now calls `addFilter()` to ensure a non-null filter is added.
   * `Builder.withX` pattern change to `Builder.setX`
   * Builder method documentation added.
   * changed Builder exceptions from IllegalArgument to NPE
   * The quesiton of whether to use a Builder here is still open.
   
   ## BloomFilter
   
   * added `isEmpty()`
   
   ## LayeredBloomFilter
   
   * Changed the hyphens in the cite for the article but left the "V1"s in the 
page numbers as that is how the publisher asks for it to be cited.
   * updated documentation
   * Access to layers is provided because there can be additional info 
associated with the Bloom filter. In the paper and in many implementations the 
find returns the deepest node that matches. Our finds returns the indexes that 
match so that the application may select the deepest, the shallowest, the one 
from the middle, or perhaps all of them and do some subsequent processing.
   * removed `clear(int)` and `target()`
   * added `contains(BloomFilterProducer)` to handle the special case.
   * Continued to return 0 (zero) for the `characteristics()` as the layered 
Bloom filter makes some assumptions that work best when it is considered 
non-sparse. As we don't know what other characteristics may be added in the 
future, it seems that defining the characteristics here is correct. In addition 
there is no requirement that all the layers be of the same characteristic.
   * notes on `forEachIndex()` in the general case notes for the method 
indicate that index ordering and uniqueness are not guaranteed.
   * `forEachBitMap()` may only return the nbumber of BitMaps specified by the 
Shape so it must use `flatten()`.
   * simplified `forEachBitMap()`, `estimateN()`, and `estimateUnion()` to 
explicitly use `flatten()`
   * removed redundant `estimateUnion()`
   * the use case for `next()` is using `ExtendCheck.neverAdvance()` and then 
triggering the advance on some outside state change/trigger.
   
   ## Shape
   
   * clarified `estimageMaxN()` documentation
   * fixed formatting
   
   ## WrappedBloomFilter
   
   * argument for not moving code to test directory:  Let's consider this for a 
moment. The example in the test code is precisely what the Kafka KIP-936 is 
looking for. Now, I don't want to implement a "strange" bloom filter in this 
class, but I do want an easy way to extend the classes we have. Since they are 
all final decorating them with a wrapper is the only reasonable solution. So 
let's provide the solution. If we move it out, then I will have to ask to move 
it back when the KIP-936 begins development using this library. I believe they 
are starting that process now.
   * Updated documentation
   
   # Testing code
   
   ## DefaultBloomFilterTest
   
   * Provided access to the two (2) BloomFilter test classes that implement the 
Defaults (Sparse and non-Sparse).
   
   
   ## LayerManagerTest
   
   * Significantly modified `testOnMaxSize()` and `testAdvanceOnCount` to test 
wider range of values, including breaking out parameterized tests.
   * Adjusted to match the methods now provided in LayerManager
   
   ## LayeredBloomFilterTest
   
   * Removed checks of zero depth as it can not be returned.
   * Reduced time in `testExpiration()` to approx 600ms
   
   ## TestingHashers
   
   * converted to `ThreadLocalRandom.current()`
   * left `randomHasher()` as public so that it can be utilized in 3rd party 
Bloom filter testing.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [commons-collections] Claudenw commented on pull request #402: COLLECTIONS-843: Implement Layered Bloom filter

Reply via email to