Re: new coreutil? shuffle - randomize file contents

Frederik Eaton Thu, 02 Jun 2005 21:18:32 -0700

> >Phil> Is it that the app must guarantee all lines of a
> >Phil> non-seekable stdin must have an equal chance of any sort order?
> >
> >See my comment to James above. I think one need not make this
> >guarantee, since only a tiny fraction of possible sort orders will be
> >able to be tried by the user. However, I think it would be true for my
> >proposal (and the one that James suggested to Davis) if one took the
> >random number generator or hash function to be variable or unspecified
> >during analysis of the algorithm.
> 
> Thinking further on this, I don't think it matters to the guts of sort
> whether the ultimate random key is based on position hash or PRNG.


No, it doesn't ... but I don't understand why you bring it up in this
context.

> >One possibility for an efficient random permutation of a large file is
> >a program which scans the file once and records the location of each
> >line in an index, then applies a uniformly distributed random
> >permutation to this line index by stepping through it in order and
> >swapping the current entry with an entry chosen at random from the set
> >of later entries, and finally steps through the index again and
> >dereferences each line entry and prints out the corresponding line in
> >the original file.
> 
> That approach is fine on seekable files, but the user may wish to
> shuffle from stdin. sort already knows how to do this.
> 
> Here's a concrete example of Paul's suggestion as I understand it:

I understand Paul's suggestion. I was throwing out the other algorithm
since Davis was looking for something more efficient.

Regards,

Frederik


_______________________________________________
Bug-coreutils mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: new coreutil? shuffle - randomize file contents

Reply via email to