Dan Hipschman <[EMAIL PROTECTED]> wrote: > On Sun, Jan 21, 2007 at 07:14:03PM +0100, Jim Meyering wrote: >> Not to look the gift horse in the mouth, but it'd be nice >> if you wrote ChangeLog entries, too. And even (gasp! :-) >> a test case or two. Of course, we'd expect such a test case >> (probably named tests/misc/sort-compress, and based on >> tests/sample-test) to have this line in it: >> >> . $srcdir/../very-expensive >> >> If you don't have time for that, I'll take care of it, eventually. > > I'm not going to stop you :-) I just haven't had the time to look into > it yet, so I've just been running the coreutils tests and then my own > tests. I was planning on adding the tests, as you say, "eventually". > >> Default to just "gzip", not /bin/gzip. The latter may not exist; >> your patch already handles that, but /bin/gzip may not be the first >> gzip in PATH. Also, don't bother with the access-XOK check. >> There's no point in incurring even such a small overhead in the >> general case, when no temporary is used. > > The reason I put the access check in there is that if we default to gzip > and it doesn't exist, then of course the exec will fail, the child will
This is a good argument for using libz by default, not a separate gzip program. Why incur the overhead of an exec when we don't need to? Now, I'm convinced that sort should provide built-in support for both gzip and bzip2. How to select built-in vs. actually exec the program is something to think about... Of course it should still be possible to specify some other program. > fail, and this will cause sort to fail when it really should just not do > the compression, or try another default if something suitable exists > (what about compress?). How about we just delay the determination of compress? no thank you! :-) > the compress program until it's actually needed (e.g., in create_temp > right before "if (compress_program)" we have "if (compress_program_known)" > and inside the body we check the environment variable and/or do access > checks on possible defaults)? > >> But please address the FIXME I've added. > > If we can't fork a compression process, it's not the end of the world. > We just don't compress that file and sort will still work. If we can't > fork a decompression process, we can't continue the sort. So I figure > we'll just try twice to fork compression processes, and if we can't do > it after 1 sec, we're probably wasting more time waiting to fork than we > would doing disk access. However, we really need to be able to fork > decompression processes, so we can afford to wait a really long time for > it. I was considering making the number of tries for decompression > processes even larger (now, it'll wait about 2 min before giving up). > >> Have you considered using the gnulib hash module rather than >> rolling your own? There are examples in many of coreutils/src/*.c. > > I'm not familiar with gnulib, so I didn't know a hash module existed or > think to look for one. Looking at it now, though, it seems it will be > slower because of its abstraction, unless the table fills up to the Performance isn't the issue here, but code-reuse. I would be very surprised if changing hash table implementations has any measurable effect on sort's performance. > point where it would be faster to access if it grew. I'd prefer (since > it seems to be my time we're talking about), to leave it the way it is > because it's simple, and see if the gnulib module is faster later. _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils