On Sat, Jan 13, 2007 at 10:07:59PM -0800, Paul Eggert wrote: > Thanks. I like the idea of compression, but before we get into the > details of your patch, what do you mean by there not being a > performance improvement with this patch? What's the holdup on > performance? It seems to me that compression ought to be a real win.
Well, after profiling, it turns out I just have a really bad way of associating compression information with file names: % cumulative self self total time seconds seconds calls s/call s/call name 47.91 4.92 4.92 9269 0.00 0.00 find_temp 13.06 6.26 1.34 memcoll 9.65 7.25 0.99 9269 0.00 0.00 fillbuf 7.80 8.05 0.80 198 0.00 0.02 mergefps 6.92 8.76 0.71 625637 0.00 0.00 compare 2.63 9.03 0.27 199885 0.00 0.00 compress_and_write_bytes 2.63 9.30 0.27 xmemcoll 0.78 9.38 0.08 187247 0.00 0.00 read_and_decompress_bytes 0.78 9.46 0.08 24444 0.00 0.00 mergelines 0.78 9.54 0.08 xalloc_die 0.68 9.61 0.07 6414 0.00 0.00 load_compression_buffer 0.68 9.68 0.07 59777 0.00 0.00 write_bytes I associated compression buffers with filenames in the temp file list, and use a linear search to retrieve them. It seemed simple and I was just prototyping the patch at the time, but man, I guess I grossly underestimated the number of temp files being created. Actually, this example isn't realistic since it was run with the -S 1k flag, but without the -S flag, even the largest file on my hard drive (~80M) didn't use any temp files so I had to resort to this artificial approach. I can replace the linear search with something that runs in constant time and then I think compression will be a win. My mistake for not profiling earlier. Also, you should note that I'm running all these tests on a very old computer (Pentium II, 400MHz), since it's all this broke college student has access to at the moment. On a faster processor, the results should be better. Still, let me modify my patch and I think the problem will go away. > If it's not a win, we shouldn't bother with LZO; instead, we should > use an algorithm that will typically be a clear win -- or, if there > isn't any such algorithm, we shouldn't hardware any algorithm at all. I think it will be a win, so I'll leave these additional thoughts alone for right now. Thanks, Dan _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils