Hi Bernhard, I'd be happy to do that. But first, I just want to make sure that I'm not missing anything.
On Sat, 28 Dec 2013 18:18:27 +0100 Bernhard Voelker <[email protected]> wrote: > Hi Jyothis, > > On 12/28/2013 03:36 PM, Jyothis V wrote: > > On Sat, 28 Dec 2013 13:49:53 +0100 > > Bernhard Voelker <[email protected]> wrote: > > > >> On 12/28/2013 10:53 AM, Jyothis V wrote: > >>> I would like to know why shuf.c is using reservoir sampling + > >>> write_permuted_output_reservoir rather than just using an > >>> inside-out version Fisher-Yates shuffle. > >> > >> Reservoir sampling has been added to shuf in the youngest coreutils > >> release 8.22 via this commit: > >> > >> http://git.sv.gnu.org/cgit/coreutils.git/commit/?id=20d7bce0 > > CCing Assaf who is the author of that commit. > > > Hi, thanks for the reply. I understand why something like reservoir > > sampling is needed. But in shuf.c, shuffling is done in two steps: > > 1) using reservoir sampling, an array of length head_length is obtained. > > At this stage, the array is not completely shuffled because the > > first head_length elements were copied verbatim. 2) This array is > > then randomly permuted while writing. My question is whether these > > two steps could be clubbed together, just as shown in the second > > algorithm given in the wikipedia page you mentioned. > > I didn't have a look into the Maths behind yet, nor was I involved > during that last improvement. Further improvement is maybe possible, > and the best way to push this is providing code. Are you willing > to propose such a patch? > > Thanks & have a nice day, > Berny -- Jyothis V <[email protected]>
