We should talk about a design question surrounding binary search with `canFind`/`find` and possibly other linear-search functions.

Currently we have binary search in Phobos as part of std.range.SortedRange. Its interface is not compatible with `canFind` or `find` - you can't simply wrap the haystack in a SortedRange and pass it to an algorithm to give you logarithmic complexity.

The first question is whether this opportunistic upgrade is desirable - binary search has much better worst-case complexity than linear search, but it's not necessarily faster in the average case, which depends on the specific use pattern. One important thing to note is that, assuming the binary-search specialization is documented, the user can use `SortedRange.release` to explicitly request linear search.

Myself and others have sometimes mistakenly expected `find` and friends to be specialized for `SortedRange` inputs, opportunistically providing better worst-case complexity, but this is not the case. It seems simple at first glance, but the problem lies in the predicate - binary search can only be leveraged when the specific order is known:

---
auto data = assumeSorted([1, 2, 3]);

// Equality, so bsearch then trot left
auto result = data.find!((x, e) => x == e)(2); // default predicate
assert(result.equal([2, 3]));

// Opposite order and exclusive, bsearch then trot right
result = data.find!((x, e) => x > e)(2);
assert(result.equal([3]));

// Same order, bsearch then trot left.
// Compare first element as an optimization?
result = data.find!((x, e) => x < e)(0);
assert(result.empty);

struct S { string name; }
auto data2 = assumeSorted!((a, b) => a.name < b.name)(
    [S("a"), S("b"), S("c")]
);

// Same order and exclusive, but with a transformation, yikes...
auto result2 = data2.find!((x, e) => x.name < e.name)(S("b"));
assert(result2.equal(data2));
---

Identifying the characteristics described in the code comments above is the biggest problem: predicate functions don't have any notion of equality.

String lambdas can be compared for equality, but they're really fragile: "a == b" != "b == a" etc. Besides, string lambdas are undesirable for other reasons and should be phased out in the long-term[1].

Someone suggested defining a standard set of predicates, making it look like this:

---
auto data = assumeSorted!less([1, 2, 3]); // Would be default

// Equality, so bsearch then trot left
auto result = data.find!equality(2); // Would be default
---

Templates can be compared with __traits(isSame, ...), so this approach seems feasible. If we want to do this, this seems like the most realistic approach. I'm not sure if it can be made to work when transformation of arguments is involved, but it might still be worth doing.

Another issue is that SortedRange's interface does not actually support all the above patterns because it splits the result into 3 instead of 2; we would need to amend SortedRange to support the inverse functions of lowerBound and upperBound.

So, what does the community think? Desirable, or not? Thoughts about implementation?

[1] http://forum.dlang.org/post/[email protected]

Reply via email to