Since you replied to my message rather than to David's, I assume you had seen
the [issue] I linked.
Masquerading an arbitrary binary search as a binary search on a list -- a
random access list, as Louis noted correctly -- can do something for you:
class Element implements Comparable<Element> { /* ... */ }
class MyList extends AbstractList<Element>
implements java.util.RandomAccess {
@Override
public int size() {
return Integer.MAX_VALUE;
}
@Override
public Element get(int index) {
return /* arbitrary computation */
}
}
int idx = Collections.binarySearch(new MyList().subList(0, 1024),
new Element( /* ... */ ));
But it's a trick which can do only so much; for instance:
- it cannot help you find a lower or an upper bound (i.e. the left-most or
right-most occurrence of a non-unique key),
- it is hard-to-read and error-prone in its own way
[issue]: https://bugs.openjdk.org/browse/JDK-8326330
> On 15 May 2024, at 23:19, Mateusz Romanowski <[email protected]>
> wrote:
>
> Hi,
> I would say it is not worth any effort.
>
> One can easily write an ad-hoc *local* adapter extending
> `java.util.AbstractList`..
> .. and immediately invoke existing `java.util.Collections::binarySearch`
> method.
>
> Cheers,
> MR
>
> On Thu, Apr 25, 2024 at 9:09 PM Pavel Rappo <[email protected]> wrote:
>
>
> > On 25 Apr 2024, at 19:41, David Lloyd <[email protected]> wrote:
> >
> > The JDK contains a few collection- and array-oriented implementations of
> > binary search. For the most part, they all work the same way: search the
> > target "thing" by index using the obvious bisection strategy, returning
> > either /index/ or /-index-1/ depending on whether it was found at the end
> > of the search.
> >
> > However, if you're doing a binary search on a thing that is not a list or
> > an array, you have two choices: try to make the thing you're searching on
> > implement List (often infeasible) or write your own binary search.
> >
> > I'm really tired of writing my own binary search. I've probably done it 50
> > times by now, each one using a slightly different access and/or compare
> > strategy, and every time is a new chance to typo something or get something
> > backwards, leading to irritating debugging sessions and higher dentist
> > bills.
>
> Can we safely say that it sets your teeth on edge?
>
> > It got me to thinking that it wouldn't be too hard to make a "general"
> > binary search which can search on anything, so that's what I did. I was
> > thinking that it might be interesting to try adding this into the JDK
> > somehow.
> >
> > This implementation is more or less what I now copy & paste to different
> > projects at the moment:
> >
> > public static <C, T> int binarySearch(C collection, int start, int end,
> > T key, Comparator<? super T> cmp, IntGetter<C, T> getter) {
> > int low = start;
> > int high = end - 1;
> > while (low <= high) {
> > int mid = low + high >>> 1;
> > int res = cmp.compare(getter.get(collection, mid), key);
> > if (res < 0) {
> > low = mid + 1;
> > } else if (res > 0) {
> > high = mid - 1;
> > } else {
> > return mid;
> > }
> > }
> > return -low - 1;
> > }
> > // (Plus variations for `Comparable` keys and long indices)
> >
> > A key problem with this approach is that in the JDK, there is no
> > `ObjIntFunction<T, R>` or `ObjLongFunction<T, R>` which would be necessary
> > to implement the "getter" portion of the algorithm (despite the existence
> > of the analogous `ObjIntConsumer<T>` and `ObjLongConsumer<T>`). So, I also
> > have to copy/paste a `IntGetter<T, R>`/`LongGetter<T, R>` as well, which is
> > annoying.
> >
> > A slightly less-good approach is for the general `binarySearch` method to
> > accept a `IntFunction<T>`/`LongFunction<T>`, and drop the `C collection`
> > parameter, like this:
> >
> > public static <T> int binarySearch(int start, int end, T key,
> > Comparator<? super T> cmp, IntFunction<T> getter) { ... }
> >
> > In this case, we can use the function types that the JDK already provides,
> > but we would very likely have to also create a capturing lambda (e.g.
> > `myList::get` instead of `List::get`). Maybe this isn't that bad of a
> > compromise.
> >
> > It would be possible to replace the existing `binarySearch` implementations
> > with delegating calls to a generalized implementation. For `Collections`,
> > the indexed version uses `List::get` and the iterator version uses a helper
> > lambda to move the iterator and get the result. For arrays, a lambda would
> > be provided which gets the corresponding array element. If the
> > non-capturing variant is used, then (on paper at least) this version should
> > perform similarly to the existing implementations, with less code needed
> > overall. However, if a capturing lambda is required (due to the
> > aforementioned lack of `ObjXFunction`), then this could be slightly
> > worse-performing than the existing implementation due to the construction
> > (and maybe dispatch) overhead of the lambda. Some real-world benchmarks
> > would have to be written with various-sized data sets.
> >
> > It would also be possible to produce primitive variations which operate on
> > int, float, long, and double values, using existing functions if capturing
> > is deemed "OK". It is also possible to produce a variation which uses a
> > `long` for the index, for huge data sets (for example, very large mapped
> > files using `MemorySegment`).
> >
> > Also unclear is: where would it live? `Collections`? Somewhere else?
> >
> > Any thoughts/opinions would be appreciated (even if they are along the
> > lines of "it's not worth the effort"). Particularly, any insight would be
> > appreciated as to whether or not this kind of hypothetical enhancement
> > would warrant a JEP (I wouldn't think so, but I'm no expert at such
> > assessments).
> >
> > --
> > - DML • he/him
>
> Have a look at this recently filed issue:
> https://bugs.openjdk.org/browse/JDK-8326330
>
> -Pavel
>
>