Paul, Thank you for submitting a patch of your own.
> > I still think what I sent is ready to apply. > > I took a look at the patch quoted in that email and have some comments > and suggestions. > > It's more conservative to minimize the changes to the ISAAC code. The > simplest thing to do for now is to put it into a rand-isaac.c file, > which is included both by shred.c and by sort.c. This results in a > trivial change to shred.c (remove a block of code and replace it by an > #include of a copy of that code) and greater confidence that we > haven't messed up the random-number generator somehow. I didn't quite > follow all the ISAAC_WORDS stuff, but since the patch you sent uses > the same value for both sort and shred, we might as well leave it a > constant, to minimize the number of changes to the code. Jim agreed that the way I have chosen to share the ISAAC code would be suitable. I have tested it and it produces the same random values as before, when initialized with the same number of words and the same seed. > Eventually, once we factor in /dev/random / /dev/urandom stuff, this > can be revisited, but in the meantime it's better to minimize changes > to the code. I think my changes were small. I tried the #include method but it generated a bunch of preprocessor errors which I wasn't able to fix. It appears that you have managed to eliminate them. I still think that my method is cleaner since it relies on linking rather than direct inclusion, but if you are indeed going to remove the ISAAC code then I suppose it doesn't matter. However, what do you plan to replace ISAAC with in sort? > As a documentation issue, I'd feel more comfortable calling this a > "hashed" sort rather than a "random" sort. Using the word "random" > has the wrong connotations, since the sort doesn't generate a random > permutation of the input. So I would prefer option names like > --hash-sort and -h. I assume you are referring to the same-keys-sort-together issue, and not the randomness-vs-pseudorandomness issue. It is true that we are sorting via a hash, but that is an implementation detail. The effect is to create a random total ordering on the keys, and sort via this ordering. Because it is a total ordering, keys which are the same must be equal. Yet calling the option --random-total-order-sort would be redundant because all sorting in the 'sort' command is done with respect to a total order. (even if the equivalence classes are not singletons, as with "3" == "3.0" for numeric sorting, it is still a total order) You can look up "total order" on Wikipedia, for instance. Also, as we've discussed, and as you've argued, an alternative option which caused same-keys-sort-apart behavior probably doesn't belong in the sort command - it wouldn't be meaningful to specify a "key" with it, and it wouldn't interact with any of the other sorting methods. So there is no potential for confusing the --random-sort option with a possible future variant. So I would like to leave the name as it is. I hope my reasoning is clear to you. > When two different keys hash to the same value, the comparison should > fall back on byte-by-byte comparison of the keys; otherwise the user > can't rely on identical keys sorting next to each other, which is one > of the desired properties here. Can you give an example of two such keys? In any case, I've changed my local version to do this. > I don't see why get_hash needs to extract a subset of data from the mm > buffer. Why not just compare the contents of the entire buffer? Is > there something nonrandom about that? If so, perhaps we should go the > whole hog and use the "correct" way to get at it. (*) What I've implemented is fast and it works. I'm sure there are other ways of doing it, but looking at the ISAAC code, I think that what I've done uses suitably random numbers. > The usual style is to use NULL rather than 0 for the null pointer. Fixed locally. > Here's a proposed patch embodying the above suggestions. I'm still a > bit puzzled about the part marked (*) above, though. Here are the results for shuffling /usr/share/dict/words: your version: ./inst4/bin/sort -h < /usr/share/dict/words > /dev/null 14.04s user 0.01s system 99% cpu 14.168 total my version: ./inst1/bin/sort -R < /usr/share/dict/words > /dev/null 1.23s user 0.01s system 99% cpu 1.243 total My version is over 11 times faster on this example. (Also, doing a normal sort takes about 0.7 seconds, so you can see where the lower limit is) However, if you fix the speed problem - which should be easy - and change the option names back to that of my original proposal, and change my email address in the ChangeLog back to [EMAIL PROTECTED] (which is more permanent), then I would be happy with the patch that you have submitted. Frederik _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils