Opportunistic worst-case complexity upgrade? aka binary search with `find`

Jakob Ovrum via Digitalmars-d Wed, 23 Jul 2014 18:26:26 -0700

We should talk about a design question surrounding binary searchwith `canFind`/`find` and possibly other linear-search functions.

Currently we have binary search in Phobos as part ofstd.range.SortedRange. Its interface is not compatible with`canFind` or `find` - you can't simply wrap the haystack in aSortedRange and pass it to an algorithm to give you logarithmiccomplexity.

The first question is whether this opportunistic upgrade isdesirable - binary search has much better worst-case complexitythan linear search, but it's not necessarily faster in theaverage case, which depends on the specific use pattern. Oneimportant thing to note is that, assuming the binary-searchspecialization is documented, the user can use`SortedRange.release` to explicitly request linear search.

Myself and others have sometimes mistakenly expected `find` andfriends to be specialized for `SortedRange` inputs,opportunistically providing better worst-case complexity, butthis is not the case. It seems simple at first glance, but theproblem lies in the predicate - binary search can only beleveraged when the specific order is known:


---
auto data = assumeSorted([1, 2, 3]);

// Equality, so bsearch then trot left

auto result = data.find!((x, e) => x == e)(2); // defaultpredicate

assert(result.equal([2, 3]));

// Opposite order and exclusive, bsearch then trot right
result = data.find!((x, e) => x > e)(2);
assert(result.equal([3]));

// Same order, bsearch then trot left.
// Compare first element as an optimization?
result = data.find!((x, e) => x < e)(0);
assert(result.empty);

struct S { string name; }
auto data2 = assumeSorted!((a, b) => a.name < b.name)(
    [S("a"), S("b"), S("c")]
);

// Same order and exclusive, but with a transformation, yikes...
auto result2 = data2.find!((x, e) => x.name < e.name)(S("b"));
assert(result2.equal(data2));
---

Identifying the characteristics described in the code commentsabove is the biggest problem: predicate functions don't have anynotion of equality.

String lambdas can be compared for equality, but they're reallyfragile: "a == b" != "b == a" etc. Besides, string lambdas areundesirable for other reasons and should be phased out in thelong-term[1].

Someone suggested defining a standard set of predicates, makingit look like this:


---
auto data = assumeSorted!less([1, 2, 3]); // Would be default

// Equality, so bsearch then trot left
auto result = data.find!equality(2); // Would be default
---

Templates can be compared with __traits(isSame, ...), so thisapproach seems feasible. If we want to do this, this seems likethe most realistic approach. I'm not sure if it can be made towork when transformation of arguments is involved, but it mightstill be worth doing.

Another issue is that SortedRange's interface does not actuallysupport all the above patterns because it splits the result into3 instead of 2; we would need to amend SortedRange to support theinverse functions of lowerBound and upperBound.

So, what does the community think? Desirable, or not? Thoughtsabout implementation?

[1]http://forum.dlang.org/post/[email protected]

Opportunistic worst-case complexity upgrade? aka binary search with `find`

Reply via email to