Re: new coreutil? shuffle - randomize file contents

Frederik Eaton Mon, 30 May 2005 06:04:51 -0700

On Wed, May 25, 2005 at 10:58:41AM +0100, James Youngman wrote:
> On Tue, May 24, 2005 at 09:55:35AM -0700, Paul Eggert wrote:
> 
> > That way, you could use, e.g.:
> > 
> >   sort -k 2,2 -k R
> > 
> > which would mean "sort by the 2nd field, but if there are ties then
> > sort the ties randomly".  "sort -R" would be short for "sort -k R".
> 
> Perhaps this approach avoids the problems that were discussed earlier
> regarding expectations about lines with identical keys "shuffling"
> together.


I hope it is agreed that the conclusion that was reached earlier was
that both behaviors - identical keys shuffling *together* vs. *apart*
- would be useful in different situations. We came up with a number of
situations in which one behavior or the other was necessary, and we
didn't really come up with any other ideas for useful behaviors.

I think we have yet to consider other ways of getting these two
behaviors, however. For instance, "-s" could be seen as an instruction
to "last of all, sort by the input row number". But if we implement
randomization as "sort by hash of keys" - for a "together" shuffle -
then including input row number in this hash would get the contrasting
above behavior that Paul Eggert is suggesting - the "apart" shuffle.
So with a rephrasing of the "-s" option description, it might make
sense for "-R" to indicate the "together" behavior and "-Rs" to
indicate the "apart" behavior. In this case "-s" wouldn't mean
"stable" so much as "depends on input ordering". I don't know if this
is sensible. Anyway, here is the end of the last thread:

http://lists.gnu.org/archive/html/bug-coreutils/2005-02/msg00005.html

Frederik


_______________________________________________
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: new coreutil? shuffle - randomize file contents

Reply via email to