Re: feature request: gzip/bzip support for sort

Philip Rowlands Thu, 18 Jan 2007 14:38:59 -0800

On Thu, 18 Jan 2007, Jim Meyering wrote:

I've done some more timings, but with two more sizes of input.
Here's the summary, comparing straight sort with sort --comp=gzip:


 2.7GB:   6.6% speed-up
 10.0GB: 17.8% speed-up

It would be interesting to see the individual stats returned by wait4(2)from the child, to separate CPU seconds spent in sort itself, and in thecompression/decompression forks.

I think allowing an environment variable to define the compressor is agood idea, so long as there's a corresponding --nocompress overrideavailable from the command line.

 $ seq 9999999 > k
 $ cat k k k k k k k k k > j
 $ cat j j j j > sort-in
 $ wc -c sort-in
 2839999968 sort-in


I had to use "seq -f %.0f" to get this filesize.

With --compress=gzip:
 $ /usr/bin/time ./sort -T. --compress=gzip < sort-in > out
 814.07user 29.97system 14:50.16elapsed 94%CPU (0avgtext+0avgdata 
0maxresident)k  0inputs+0outputs (4major+2821589minor)pagefaults 0swaps

There's a big difference in the time spent on gzip compression dependingon the -1/-9 option (default -6). For a similar seq-generated data setabove, I get

gzip -1: User time (seconds): 48.63, output size is 6% of input
gzip -9: User time (seconds): 952.97, output size is 3% of input

Decompression time for both tests shows less variation (25s vs 21s).

This would suggest the elapsed time to sort can be improved by tradingcompression ratio for less CPU time. Obviously a critical factor is thedisk latency.



Cheers,
Phil


_______________________________________________
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: feature request: gzip/bzip support for sort

Reply via email to