I found Reservoir-Sampling algorithms of time complexity O(n(1+log(N/n))) by Kim-Hung Li , ACM Transactions on Mathematical Software Vol 20 No 4 Dec 94 p481-492. He mentions algorithm Z and K and proposed 2 improved versions alg L and M. Algorith L is really easy to implement but relatively slow, M doesn't look very difficult and is the fastest. Heberto Ghezzo McGill University Montreal - Canada
Quoting François Pinard <[EMAIL PROTECTED]>: > [Martin Maechler] > > > FrPi> Suppose the file (or tape) holds N records (N is not known > > FrPi> in advance), from which we want a sample of M records at > > FrPi> most. [...] If the algorithm is carefully designed, when > > FrPi> the last (N'th) record of the file will have been processed > > FrPi> this way, we may then have M records randomly selected from > > FrPi> N records, in such a a way that each of the N records had an > > FrPi> equal probability to end up in the selection of M records. I > > FrPi> may seek out for details if needed. > > >[...] I'm also intrigued about the details of the algorithm you > >outline above. > > I went into my old SPSS books and related references to find it for you, > to no avail (yet I confess I did not try very hard). I vaguely remember > it was related to Spearman's correlation computation: I did find notes > about the "severe memory limitation" of this computation, but nothing > about the implemented workaround. I did find other sampling devices, > but not the very one I remember having read about, many years ago. > > On the other hand, Googling tells that this topic has been much studied, > and that Vitter's algorithm Z seems to be popular nowadays (even if not > the simplest) because it is more efficient than others. Google found > a copy of the paper: > > http://www.cs.duke.edu/~jsv/Papers/Vit85.Reservoir.pdf > > Here is an implementation for Postgres: > > http://svr5.postgresql.org/pgsql-patches/2004-05/msg00319.php > > yet I do not find it very readable -- but this is only an opinion: I'm > rather demanding in the area of legibility, while many or most people > are more courageous than me! :-). > > -- > François Pinard http://pinard.progiciels-bpi.ca > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html