I think that this works. A version which I find easier to prove is correct
is this:
i = 0
for each element in input
i++
pick a uniform integer from [0, i)
if (i < k) output[i] = element
The basic idea here is that this is identical to shuffling the entire input
and keeping only the first k elements of the shuffle. Since we know that we
are only keeping k elements, we know that any elements that get swapped out
of the first k will never be seen again. That means that the swaps can be
reduced to assignments and all swaps to positions outside of 0..k-1 can be
forgotten entirely.
On Fri, Jul 3, 2009 at 11:17 AM, Sean Owen <[email protected]> wrote:
> with probability 1/(output size + 1)
> pick a random output element to remove
> add element to output
>
--
Ted Dunning, CTO
DeepDyve
111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
http://www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)