aherbert commented on pull request #258:
URL:
https://github.com/apache/commons-collections/pull/258#issuecomment-994837812
I did not have time for a full review. Here are some thoughts.
How much of an issue is it to change Bloom filter indexes to a long from an
int? Limiting to an int prevents supporting very large Bloom filters.
On a quick look through you have static helper methods in static classes
that are public, e.g.
```java
BitMap.isPositive(int);
BitMap.checkRange(int, int);
```
For a first release it is better to minimise the public API.
This is an expensive way to break a forEach consumer pattern as the stack
trace will be generated for the new exception:
```java
@Override
public boolean contains(IndexProducer indexProducer) {
try {
indexProducer.forEachIndex(idx -> {
if (!indices.contains(idx)) {
throw new NoMatchException();
}
});
return true;
} catch (NoMatchException e) {
return false;
}
}
```
This is a key feature of a Bloom filter. To be able to check if a filter
contains a hash and return false early if possible. It should be fast and may
warrant not using IntConsumer from java.util.function so that the IndexConsumer
can return false:
```java
interface IndexConsumer {
boolean accept(int value);
}
```
Anything passing values to the IndexConsumer can stop if false is returned.
An alternative is to store a static final exception that can be thrown avoiding
creating a new exception and stack trace. The performance of the alternatives
should be investigated.
If this is required in BloomFilter to provide a single representation for IO:
```java
static int[] asIndexArray(BloomFilter filter) {
List<Integer> lst = new ArrayList<Integer>();
filter.forEachIndex(lst::add);
return lst.stream().mapToInt(Integer::intValue).toArray();
}
```
Then: (1) indexes cannot be supported using a long in the future as the type
is int[] but also returning arrays will be limited to the maximum array length;
(2) it should be made more efficient to avoid boxing ints and then using a
potentially very large stream.
Would this work to convert a BitMapProducer to an IndexProducer. It allows
skipping empty words:
```java
LongConsumer longConsumer = new LongConsumer() {
int wordIdx = 0;
@Override
public void accept(long word) {
int i = wordIdx;
while (word != 0) {
if ((word & 1) == 1) {
consumer.accept(i);
}
word >>>= 1;
i++;
}
wordIdx += 64;
}
};
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]