Jason House wrote:
Andrei Alexandrescu Wrote:
I'm writing a range to select a random sampling of a forward range.
The number of elements in the forward range is already known.
The strategy is: at any moment, I know I need to select k random
samples out of n. To implement popFront, I generate a random number
in [0, n - k]. That's the probability that the next item in the
input will be part of the selection. If the random number is 0,
then the element got selected. Otherwise, I must ignore that guy so
I popFront the input range, decrease n by one, and repeat.
This seems reasonable but somehow the selection comes skewed: the
elements towards the beginning of the range are less represented
than the ones towards the end. Where am I wrong?
Andrei
You have your probability wrong. The selection probability is k/n.
Pick a number in [0,n) and pick the element if the result is in
[0,k). If you do select an item, decrement k. Always decrement n when
popping, regardless of if the element is selected.
Great, thanks, that works perfect. If I want to choose _toSelect stuff
out of _available items, this works to position to the next element:
for (;;)
{
auto r = uniform(0, _available);
if (r < _toSelect)
{
// chosen!
return;
}
// not chosen, retry
assert(!_input.empty);
_input.popFront;
--_available;
assert(_available > 0);
}
As an aside, it looks like the right-open range for random numbers is a
good default.
Andrei