On Sat, 9 Oct 2010 14:52:41 +0200 Ole Tange <ta...@gnu.org> wrote: > I recently needed to randomize some lines. So I tried using 'sort -R'. > I was astonished how slow that was. So I tested how slow a competing > strategies are. GNU sort is two magnitudes slower than unsort and more > than one magnitude slower than perl: > > $ time unsort file > real 0m1.388s > > $ unsort --version > unsort 1.1.2 > > $ time perl -e 'print sort { rand() <=> rand() } <>' file > real 0m6.621s > > $ time sort -R file > real 4m8.403s > > $ sort --version > sort (GNU coreutils) 8.5 > > What is even scarier: sort without -R is faster than sort -R: > > $ time sort file > real 0m53.553s > > I would expect sort -R to be faster than sort and faster than Perl if > not as fast as unsort.
On my system, locale settings seem to impact the runtime significantly: $ wc -l bigfile 1000000 bigfile $ time LC_ALL=en_US.utf8 sort -R bigfile > /dev/null real 1m29.302s user 1m21.009s sys 0m0.155s $ time LC_ALL=C sort -R bigfile > /dev/null real 0m38.881s user 0m35.276s sys 0m0.118s However, shuf is much faster, and seems mostly unaffected by the locale used: $ time shuf bigfile > /dev/null real 0m1.044s user 0m0.833s sys 0m0.042s -- D.