Hi Jyothis, On 12/28/2013 03:36 PM, Jyothis V wrote: > On Sat, 28 Dec 2013 13:49:53 +0100 > Bernhard Voelker <[email protected]> wrote: > >> On 12/28/2013 10:53 AM, Jyothis V wrote: >>> I would like to know why shuf.c is using reservoir sampling + >>> write_permuted_output_reservoir rather than just using an >>> inside-out version Fisher-Yates shuffle. >> >> Reservoir sampling has been added to shuf in the youngest coreutils >> release 8.22 via this commit: >> >> http://git.sv.gnu.org/cgit/coreutils.git/commit/?id=20d7bce0
CCing Assaf who is the author of that commit. > Hi, thanks for the reply. I understand why something like reservoir > sampling is needed. But in shuf.c, shuffling is done in two steps: > 1) using reservoir sampling, an array of length head_length is obtained. > At this stage, the array is not completely shuffled because the > first head_length elements were copied verbatim. 2) This array is > then randomly permuted while writing. My question is whether these > two steps could be clubbed together, just as shown in the second > algorithm given in the wikipedia page you mentioned. I didn't have a look into the Maths behind yet, nor was I involved during that last improvement. Further improvement is maybe possible, and the best way to push this is providing code. Are you willing to propose such a patch? Thanks & have a nice day, Berny
