Great! Look forward to seeing this in the distribution. Thanks, --Jonathan
On Fri, May 14, 2004 at 12:01:15AM -0700, Paul Eggert wrote: > Instead of adding a new option, I think I'd rather change 'sort' to > cater to your (relatively common) case, rather than to the (relatively > contrived) cases like `cat F | sort -m -o F - G' where people should > know that they're getting into trouble anyway. > > Here's a proposed patch to solve your problem that way instead. > > 2004-05-13 Paul Eggert <[EMAIL PROTECTED]> > > Improve performance of `sort -m' on large files, at the cost of > making some contrived examples unsafe. POSIX allows this > optimization. Performance problem reported by Jonathan Baker in > <http://mail.gnu.org/archive/html/bug-coreutils/2004-05/msg00071.html>. > > * src/sort.c (first_same_file): Do not treat input pipes > differently from other files. > * doc/coreutils.texi (sort invocation): Document that "sort -m -o F" > might write F before reading all the input. > * NEWS: Likewise. > > Index: NEWS > =================================================================== > RCS file: /home/meyering/coreutils/cu/NEWS,v > retrieving revision 1.206 > diff -p -u -r1.206 NEWS > --- NEWS 11 May 2004 16:48:42 -0000 1.206 > +++ NEWS 14 May 2004 06:35:30 -0000 > @@ -20,6 +20,12 @@ GNU coreutils NEWS > > ** New features > > + For efficiency, `sort -m' no longer copies input to a temporary file > + merely because the input happens to come from a pipe. As a result, > + some relatively-contrived examples like `cat F | sort -m -o F - G' > + are no longer safe, as `sort' might start writing F before `cat' is > + done reading it. This problem cannot occur unless `-m' is used. > + > pwd now works even when run from a working directory whose name > is longer than PATH_MAX. > > Index: doc/coreutils.texi > =================================================================== > RCS file: /home/meyering/coreutils/cu/doc/coreutils.texi,v > retrieving revision 1.180 > diff -p -u -r1.180 coreutils.texi > --- doc/coreutils.texi 9 May 2004 19:42:19 -0000 1.180 > +++ doc/coreutils.texi 14 May 2004 06:32:53 -0000 > @@ -3265,9 +3265,13 @@ starting with 1. So to sort on the seco > @opindex --output > @cindex overwriting of input, allowed > Write output to @var{output-file} instead of standard output. > -If necessary, @command{sort} reads input before opening > +Normally, @command{sort} reads all input before opening > @var{output-file}, so you can safely sort a file in place by using > commands like @code{sort -o F F} and @code{cat F | sort -o F}. > +However, @command{sort} with @option{--merge} (@option{-m}) can open > +the output file before reading all input, so a command like @code{cat > +F | sort -m -o F - G} is not safe as @command{sort} might start > +writing @file{F} before @command{cat} is done reading it. > > @vindex POSIXLY_CORRECT > On newer systems, @option{-o} cannot appear after an input file if > Index: src/sort.c > =================================================================== > RCS file: /home/meyering/coreutils/cu/src/sort.c,v > retrieving revision 1.284 > diff -p -u -r1.284 sort.c > --- src/sort.c 26 Apr 2004 15:37:33 -0000 1.284 > +++ src/sort.c 14 May 2004 05:45:52 -0000 > @@ -1878,9 +1878,7 @@ sortlines_temp (struct line *lines, size > } > > /* Return the index of the first of NFILES FILES that is the same file > - as OUTFILE. If none can be the same, return NFILES. Consider an > - input pipe to be the same as OUTFILE, since the pipe might be the > - output of a command like "cat OUTFILE". */ > + as OUTFILE. If none can be the same, return NFILES. */ > > static int > first_same_file (char * const *files, int nfiles, char const *outfile) > @@ -1910,7 +1908,7 @@ first_same_file (char * const *files, in > ? fstat (STDIN_FILENO, &instat) > : stat (files[i], &instat)) > == 0) > - && (S_ISFIFO (instat.st_mode) || SAME_INODE (instat, outstat))) > + && SAME_INODE (instat, outstat)) > return i; > } > _______________________________________________ Bug-coreutils mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-coreutils
