On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
Given a set A of n elements (let's say it's a random-access range of
size n, where n is relatively large), and a predicate P(x) that
specifies some subset of A of elements that we're interested in, what's the best algorithm (in terms of big-O time complexity) for selecting a random element x satisfying P(x), such that elements that satisfy P(x) have equal probability of being chosen? (Elements that do not satisfy
P(x) are disregarded.)

I'd like to note here that, if you make use of the same P(x) many times (instead of different predicates on each call), it makes sense to spend O(n) time and memory filtering by that predicate and storing the result, and then answer each query in O(1).

3) Permutation walk:

        auto r = ... /* elements of A */
        foreach (i; iota(0 .. r.length).randomPermutation) {
                if (P(r[i])) return r[i];
        }
        /* no elements satisfy P(x) */

Advantages: if an element that satisfies P(x) is found early, the loop will terminate before n iterations. This seems like the best of
   both worlds of (1) and (2), except:

Disadvantages: AFAIK, generating a random permutation of indices from 0 .. n requires at least O(n) time, so any advantage we may have had
   seems to be negated.

Is there an algorithm for *incrementally* generating a random permutation of indices? If there is, we could use that in (3) and thus achieve the best of both worlds between early termination if an element satisfying P(x) is found, and guaranteeing termination after n iterations if no elements satisfying P(x) exists.

Yes, there is.

There are actually two variations of Fisher-Yates shuffle:
(https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle)

1.
    auto p = n.iota.array;
    foreach (pos; 0..n) {
        auto otherPos = uniform (0, pos + 1);
        swap (p[pos], p[otherPos]);
    }

When we look at this after k-th iteration, the first k elements are randomly and uniformly permuted, and the rest (n-k) are left untouched.

2.
    auto p = n.iota.array;
    foreach (pos; 0..n) {
        auto otherPos = uniform (pos, n);
        swap (p[pos], p[otherPos]);
    }

When we look at this after k-th iteration, the first k elements are a random combination of all n elements, and this combination is randomly and uniformly permuted. So, the second variation is what we need: each new element is randomly and uniformly selected from all the elements left. Once we get the first element satisfying the predicate, we can just terminate the loop. If there are m out of n elements satisfying the predicate, the average number of steps is n/m.

Now, the problem is that both of these allocate n "size_t"-s of memory to start with. And your problem does not allow to shuffle the elements of the original array in place, so we do need an external permutation for these algorithms. However, there are at least two ways to mitigate that:

(I)
We can allocate the permutation once using n time and memory, and then, on every call, just reuse it in its current state in n/m time. It does not matter if the permutation is not identity permutation: by symmetry argument, any starting permutation will do just fine.

(II)
We can store the permutation p in an associative array instead of a regular array, actually storing only the elements accessed at least once, and assuming other elements to satisfy the identity p[x] = x. So, if we finish in n/m steps on average, the time and extra memory used will be O(n/m) too. I can put together an example implementation if this best satisfies your requirements.

Ivan Kazmenko.

Reply via email to