Re: feature request: gzip/bzip support for sort

Dan Hipschman Sun, 14 Jan 2007 11:15:55 -0800

On Sat, Jan 13, 2007 at 10:07:59PM -0800, Paul Eggert wrote:
> Thanks.  I like the idea of compression, but before we get into the
> details of your patch, what do you mean by there not being a
> performance improvement with this patch?  What's the holdup on
> performance?  It seems to me that compression ought to be a real win.


Well, after profiling, it turns out I just have a really bad way of
associating compression information with file names:

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 47.91      4.92     4.92     9269     0.00     0.00  find_temp
 13.06      6.26     1.34                             memcoll
  9.65      7.25     0.99     9269     0.00     0.00  fillbuf
  7.80      8.05     0.80      198     0.00     0.02  mergefps
  6.92      8.76     0.71   625637     0.00     0.00  compare
  2.63      9.03     0.27   199885     0.00     0.00  compress_and_write_bytes
  2.63      9.30     0.27                             xmemcoll
  0.78      9.38     0.08   187247     0.00     0.00  read_and_decompress_bytes
  0.78      9.46     0.08    24444     0.00     0.00  mergelines
  0.78      9.54     0.08                             xalloc_die
  0.68      9.61     0.07     6414     0.00     0.00  load_compression_buffer
  0.68      9.68     0.07    59777     0.00     0.00  write_bytes

I associated compression buffers with filenames in the temp file list,
and use a linear search to retrieve them.  It seemed simple and I was
just prototyping the patch at the time, but man, I guess I grossly
underestimated the number of temp files being created.  Actually, this
example isn't realistic since it was run with the -S 1k flag, but
without the -S flag, even the largest file on my hard drive (~80M)
didn't use any temp files so I had to resort to this artificial
approach.  I can replace the linear search with something that runs in
constant time and then I think compression will be a win.  My mistake
for not profiling earlier.

Also, you should note that I'm running all these tests on a very old
computer (Pentium II, 400MHz), since it's all this broke college student
has access to at the moment.  On a faster processor, the results should
be better.  Still, let me modify my patch and I think the problem will
go away.

> If it's not a win, we shouldn't bother with LZO; instead, we should
> use an algorithm that will typically be a clear win -- or, if there
> isn't any such algorithm, we shouldn't hardware any algorithm at all.

I think it will be a win, so I'll leave these additional thoughts alone
for right now.

Thanks,
Dan



_______________________________________________
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: feature request: gzip/bzip support for sort

Reply via email to