I think that this works.  A version which I find easier to prove is correct
is this:

i = 0
for each element in input
       i++
       pick a uniform integer from [0, i)
       if (i < k) output[i] = element

The basic idea here is that this is identical to shuffling the entire input
and keeping only the first k elements of the shuffle.  Since we know that we
are only keeping k elements, we know that any elements that get swapped out
of the first k will never be seen again.  That means that the swaps can be
reduced to assignments and all swaps to positions outside of 0..k-1 can be
forgotten entirely.

On Fri, Jul 3, 2009 at 11:17 AM, Sean Owen <[email protected]> wrote:

>     with probability 1/(output size + 1)
>        pick a random output element to remove
>        add element to output
>



-- 
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
http://www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)

Reply via email to