> >Phil> Is it that the app must guarantee all lines of a > >Phil> non-seekable stdin must have an equal chance of any sort order? > > > >See my comment to James above. I think one need not make this > >guarantee, since only a tiny fraction of possible sort orders will be > >able to be tried by the user. However, I think it would be true for my > >proposal (and the one that James suggested to Davis) if one took the > >random number generator or hash function to be variable or unspecified > >during analysis of the algorithm. > > Thinking further on this, I don't think it matters to the guts of sort > whether the ultimate random key is based on position hash or PRNG.
No, it doesn't ... but I don't understand why you bring it up in this context. > >One possibility for an efficient random permutation of a large file is > >a program which scans the file once and records the location of each > >line in an index, then applies a uniformly distributed random > >permutation to this line index by stepping through it in order and > >swapping the current entry with an entry chosen at random from the set > >of later entries, and finally steps through the index again and > >dereferences each line entry and prints out the corresponding line in > >the original file. > > That approach is fine on seekable files, but the user may wish to > shuffle from stdin. sort already knows how to do this. > > Here's a concrete example of Paul's suggestion as I understand it: I understand Paul's suggestion. I was throwing out the other algorithm since Davis was looking for something more efficient. Regards, Frederik _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils