Re: Shuf reservoir sampling

Jyothis V Sat, 28 Dec 2013 10:04:40 -0800

Hi Bernhard, I'd be happy to do that. But first, I just want to make sure that 
I'm not missing anything.



On Sat, 28 Dec 2013 18:18:27 +0100
Bernhard Voelker <[email protected]> wrote:

> Hi Jyothis,
> 
> On 12/28/2013 03:36 PM, Jyothis V wrote:
> > On Sat, 28 Dec 2013 13:49:53 +0100
> > Bernhard Voelker <[email protected]> wrote:
> > 
> >> On 12/28/2013 10:53 AM, Jyothis V wrote:
> >>> I would like to know why shuf.c is using reservoir sampling +
> >>> write_permuted_output_reservoir rather than just using an
> >>> inside-out version Fisher-Yates shuffle.
> >>
> >> Reservoir sampling has been added to shuf in the youngest coreutils
> >> release 8.22 via this commit:
> >>
> >>   http://git.sv.gnu.org/cgit/coreutils.git/commit/?id=20d7bce0
> 
> CCing Assaf who is the author of that commit.
> 
> > Hi, thanks for the reply. I understand why something like reservoir
> > sampling is needed. But in shuf.c, shuffling is done in two steps:
> > 1) using reservoir sampling, an array of length head_length is obtained.
> > At this stage, the array is not completely shuffled because the
> > first head_length elements were copied verbatim. 2) This array is
> > then randomly permuted while writing. My question is whether these
> > two steps could be clubbed together, just as shown in the second
> > algorithm given in the wikipedia page you mentioned.
> 
> I didn't have a look into the Maths behind yet, nor was I involved
> during that last improvement. Further improvement is maybe possible,
> and the best way to push this is providing code. Are you willing
> to propose such a patch?
> 
> Thanks & have a nice day,
> Berny


-- 
Jyothis V <[email protected]>

Re: Shuf reservoir sampling

Reply via email to