There is a discussion at http://news.ycombinator.com/item?id=4833546 comparing shuf with dimsum (http://blog.noblemail.ca/2012/11/how-to-get-random-lines-out-of-file-or.html)
dimsum uses reservoir sampling so it should have a constant memory footprint proportional to -n (though due to a bug with fix in progress, it currently does not) I note that shuf does not have a constant memory footprint. This can be observed, i.e. , with yes | shuf -n 5 and running top to watch the memory grow seeming boundlessly. 1) what is the underlying algorithm ?? 2) should/could it be changed to use "reservoir sampling" to obtain benefits of constant memory footprint? Thanks for your consideration! Malcolm Cook Computational Biology - Stowers Institute for Medical Research
