On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:

Given a set A of n elements (let's say it's a random-access range of
size n, where n is relatively large), and a predicate P(x) that
specifies some subset of A of elements that we're interested in, what's the best algorithm (in terms of big-O time complexity) for selecting a random element x satisfying P(x), such that elements that satisfy P(x) have equal probability of being chosen? (Elements that do not satisfy
P(x) are disregarded.)

Which elements of A satisfy P(x) is not known ahead of time, nor is the
relative proportion of elements that satisfy P(x) or not.


Works only with a random access range and I haven't done the full analysis (and I'm a bit rusty on probability), but my first thought was random divide and conquer:

ElementType!A* select(A,P)(A r)
{
  // Recursion anchor
  if (r.length == 1) {
    if (P(r[0])) return r[0];
    else return null;
// Recurse randomly with p=0.5 into either the left, or right half of r
  } else {
    ElementType!A* e;
    ElementType!A[][2] half;
    half[0] = r[0..floor(r.length/2)];
    half[1] = r[ceil(r.length/2)..$];

    ubyte coinFlip = uniform(0,1) > 0;
    // Recurse in one half and terminate if e is found there
    e = select(half[coinFlip]);
    if (e) return e;
    // Otherwise, recurse into other half
    return select(half[1 - coinFlip]);
  }
}

As stated above, I haven't done the full analysis, but intuitively speaking (which can be wrong, of course), the p=0.5 recursion ought satisfy the condition of elements satisfying P(x) being chosen uniformly; also intuitively, I'd expect the expected runtime for a uniform distribution of elements satisfying P(x) to be around O(log N). Worst-case would be that it has to inspect every element in r once => O(N)

Reply via email to