Re: [PATCH] Change 'sort' to handle fd exhaustion better.

Jim Meyering Tue, 17 Mar 2009 13:42:58 -0700

Pádraig Brady wrote:
> This looks good. In the attached version I've tweaked the
> commit message, updated THANKS and changed the test to a+x
> (which is required for one of the checker scripts).
>
> cheers,
> Pádraig.
>>From f7d704a7b17bfab3df91f57cef0c715e642f54af Mon Sep 17 00:00:00 2001
> From: Paul Eggert <egg...@cs.ucla.edu>
> Date: Fri, 13 Mar 2009 15:48:30 -0700
> Subject: [PATCH] sort: handle fd exhaustion better when merging
>
> This is an alternative to my 9 March patch labeled "Silently lower
> nmerge; don't (sometimes incorrectly) range-check"
> <http://lists.gnu.org/archive/html/bug-coreutils/2009-03/msg00070.html>.
> It differs by not using 'dup' to probe for extra file descriptors;
> instead, it simply calls 'open' (and 'pipe') to open files and pipes,
> until one of these calls fails due to file descriptor exhaustion; it
> then backs off by 1, does a merge with the files that it has opened,
> and then retries with the (now-smaller) number of files.
>
> This patch requires quite a few more changes to the source code than
> the earlier patch, but it is in some sense "better" because it doesn't
> need to call "dup" ahead of time in order to decide whether "open" or
> "pipe" will fail.  Also, it's more robust in the case where "open" or
> "pipe" fails with errno==EMFILE because some system-wide limit is
> exhausted.  It may be useful to apply all of this patch, along with
> the other patch's test case (which is somewhat different).


Thanks Pádraig,
And thanks to Paul and students.

I've taken Paul's advice from the log about adding the test
from the other patch (and then removed that sentence from the log).

I left the original patch intact for now.

Of the following,
  Subject: [PATCH 2/4] tweak test
  Subject: [PATCH 3/4] tests: add another sort/nmerge test
  Subject: [PATCH 4/4] tests: adjust sort-continue not to fail under valgrind

I expect to merge 2/4 into the original and leave 3 and 4 separate.

Patch 3/4 is derived from the earlier patch.
I removed the sort.c and tests/misc/sort-merge changes, but kept
the new test and documentation improvements.  Also, I've added the
five names to THANKS and changed the log to list Alexander Nguyen's
last name.

I'll wait a day or so, in case there are comments.

-------------------

>From e5bb3274c9f76fe47a6b2178ce1748dd7ff58b26 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Fri, 13 Mar 2009 15:48:30 -0700
Subject: [PATCH 1/4] sort: handle fd exhaustion better when merging

This is an alternative to my 9 March patch labeled "Silently lower
nmerge; don't (sometimes incorrectly) range-check"
<http://lists.gnu.org/archive/html/bug-coreutils/2009-03/msg00070.html>.
It differs by not using 'dup' to probe for extra file descriptors;
instead, it simply calls 'open' (and 'pipe') to open files and pipes,
until one of these calls fails due to file descriptor exhaustion; it
then backs off by 1, does a merge with the files that it has opened,
and then retries with the (now-smaller) number of files.

This patch requires quite a few more changes to the source code than
the earlier patch, but it is in some sense "better" because it doesn't
need to call "dup" ahead of time in order to decide whether "open" or
"pipe" will fail.  Also, it's more robust in the case where "open" or
"pipe" fails with errno==EMFILE because some system-wide limit is
exhausted.

* src/sort.c (create_temp_file): New arg SURVIVE_FD_EXHAUSTION.
(stream_open): New function, containing guts of xfopen.
(xfopen): Use it.
(pipe_fork): Set errno on failure.
(maybe_create_temp): New function, containing guts of create_temp.
(create_temp): Use it.
(open_temp): Distinguish failures due to file descriptor exhaustion
from other failures, and on fd exhaustion return a notice to caller
rather than dying.  Don't test execlp's return value; when it returns,
it *always* returns -1.
(open_input_files): New function.
(mergefps): New arg FPS. It's now the caller's responsibility to open
the input and output files.  All callers changed.
(mergefiles): New function.
(avoid_trashing_input, merge): Handle the case where a single merge
can't merge as much as we wanted due to file descriptor exhaustion, by
merging as much as we can and then retrying.
* tests/Makefile.am (TESTS): Add misc/sort-continue.
* tests/misc/sort-continue: New file.
* THANKS: Add Glen Lenker and Matt Pham who coauthored this patch.
---
 THANKS                   |    2 +
 src/sort.c               |  292 ++++++++++++++++++++++++++++++++++------------
 tests/Makefile.am        |    1 +
 tests/misc/sort-continue |   47 ++++++++
 4 files changed, 268 insertions(+), 74 deletions(-)
 create mode 100755 tests/misc/sort-continue

diff --git a/THANKS b/THANKS
index f894e1d..3b933ff 100644
--- a/THANKS
+++ b/THANKS
@@ -204,6 +204,7 @@ Geoff Whale                         geo...@cse.unsw.edu.au
 Gerald Pfeifer                      ger...@pfeifer.com
 Gerhard Poul                        gp...@gnu.org
 Germano Leichsenring                germ...@jedi.cs.kobe-u.ac.jp
+Glen Lenker                         glen.len...@gmail.com
 Göran Uddeborg                      goe...@uddeborg.se
 Guochun Shi                         g...@ncsa.uiuc.edu
 GOTO Masanori                       go...@debian.or.jp
@@ -365,6 +366,7 @@ Mate Wierdl                         
m...@moni.msci.memphis.edu
 Matej Vela                          mv...@public.srce.hr
 Matt Kraai                          kr...@ftbfs.org
 Matt Perry                          m...@primefactor.com
+Matt Pham                           mattvp...@gmail.com
 Matt Schalit                        mscha...@pacbell.net
 Matt Swift                          sw...@alum.mit.edu
 Matthew Arnison                     maf...@cat.org.au
diff --git a/src/sort.c b/src/sort.c
index 7b0b064..ced0f2d 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -732,10 +732,13 @@ exit_cleanup (void)
 }

 /* Create a new temporary file, returning its newly allocated tempnode.
-   Store into *PFD the file descriptor open for writing.  */
+   Store into *PFD the file descriptor open for writing.
+   If the creation fails, return NULL and store -1 into *PFD if the
+   failure is due to file descriptor exhaustion and
+   SURVIVE_FD_EXHAUSTION; otherwise, die.  */

 static struct tempnode *
-create_temp_file (int *pfd)
+create_temp_file (int *pfd, bool survive_fd_exhaustion)
 {
   static char const slashbase[] = "/sortXXXXXX";
   static size_t temp_dir_index;
@@ -768,8 +771,13 @@ create_temp_file (int *pfd)
   errno = saved_errno;

   if (fd < 0)
-    error (SORT_FAILURE, errno, _("cannot create temporary file in %s"),
-          quote (temp_dir));
+    {
+      if (! (survive_fd_exhaustion && errno == EMFILE))
+       error (SORT_FAILURE, errno, _("cannot create temporary file in %s"),
+              quote (temp_dir));
+      free (node);
+      node = NULL;
+    }

   *pfd = fd;
   return node;
@@ -779,27 +787,30 @@ create_temp_file (int *pfd)
    standard output; HOW should be "w".  When opening for input, "-"
    means standard input.  To avoid confusion, do not return file
    descriptors STDIN_FILENO, STDOUT_FILENO, or STDERR_FILENO when
-   opening an ordinary FILE.  */
+   opening an ordinary FILE.  Return NULL if unsuccessful.  */

 static FILE *
-xfopen (const char *file, const char *how)
+stream_open (const char *file, const char *how)
 {
-  FILE *fp;
-
   if (!file)
-    fp = stdout;
-  else if (STREQ (file, "-") && *how == 'r')
+    return stdout;
+  if (STREQ (file, "-") && *how == 'r')
     {
       have_read_stdin = true;
-      fp = stdin;
-    }
-  else
-    {
-      fp = fopen (file, how);
-      if (! fp)
-       die (_("open failed"), file);
+      return stdin;
     }
+  return fopen (file, how);
+}
+
+/* Same as stream_open, except always return a non-null value; die on
+   failure.  */

+static FILE *
+xfopen (const char *file, const char *how)
+ {
+  FILE *fp = stream_open (file, how);
+  if (!fp)
+    die (_("open failed"), file);
   return fp;
 }

@@ -838,7 +849,8 @@ dup2_or_die (int oldfd, int newfd)

 /* Fork a child process for piping to and do common cleanup.  The
    TRIES parameter tells us how many times to try to fork before
-   giving up.  Return the PID of the child or -1 if fork failed.  */
+   giving up.  Return the PID of the child, or -1 (setting errno)
+   on failure. */

 static pid_t
 pipe_fork (int pipefds[2], size_t tries)
@@ -881,8 +893,10 @@ pipe_fork (int pipefds[2], size_t tries)

   if (pid < 0)
     {
+      saved_errno = errno;
       close (pipefds[0]);
       close (pipefds[1]);
+      errno = saved_errno;
     }
   else if (pid == 0)
     {
@@ -900,15 +914,22 @@ pipe_fork (int pipefds[2], size_t tries)
 }

 /* Create a temporary file and start a compression program to filter output
-   to that file.  Set *PFP to the file handle and if *PPID is non-NULL,
-   set it to the PID of the newly-created process.  */
+   to that file.  Set *PFP to the file handle and if PPID is non-NULL,
+   set *PPID to the PID of the newly-created process.  If the creation
+   fails, return NULL if the failure is due to file descriptor
+   exhaustion and SURVIVE_FD_EXHAUSTION; otherwise, die.  */

 static char *
-create_temp (FILE **pfp, pid_t *ppid)
+maybe_create_temp (FILE **pfp, pid_t *ppid, bool survive_fd_exhaustion)
 {
   int tempfd;
-  struct tempnode *node = create_temp_file (&tempfd);
-  char *name = node->name;
+  struct tempnode *node = create_temp_file (&tempfd, survive_fd_exhaustion);
+  char *name;
+
+  if (! node)
+    return NULL;
+
+  name = node->name;

   if (compress_program)
     {
@@ -949,48 +970,68 @@ create_temp (FILE **pfp, pid_t *ppid)
   return name;
 }

+/* Create a temporary file and start a compression program to filter output
+   to that file.  Set *PFP to the file handle and if *PPID is non-NULL,
+   set it to the PID of the newly-created process.  Die on failure.  */
+
+static char *
+create_temp (FILE **pfp, pid_t *ppid)
+{
+  return maybe_create_temp (pfp, ppid, false);
+}
+
 /* Open a compressed temp file and start a decompression process through
    which to filter the input.  PID must be the valid processes ID of the
-   process used to compress the file.  */
+   process used to compress the file.  Return NULL (setting errno to
+   EMFILE) if we ran out of file descriptors, and die on any other
+   kind of failure.  */

 static FILE *
 open_temp (const char *name, pid_t pid)
 {
   int tempfd, pipefds[2];
-  pid_t child_pid;
-  FILE *fp;
+  FILE *fp = NULL;

   wait_proc (pid);

   tempfd = open (name, O_RDONLY);
   if (tempfd < 0)
-    die (_("couldn't open temporary file"), name);
+    return NULL;

-  child_pid = pipe_fork (pipefds, MAX_FORK_TRIES_DECOMPRESS);
-  if (0 < child_pid)
+  switch (pipe_fork (pipefds, MAX_FORK_TRIES_DECOMPRESS))
     {
+    case -1:
+      if (errno != EMFILE)
+       error (SORT_FAILURE, errno, _("couldn't create process for %s -d"),
+              compress_program);
       close (tempfd);
-      close (pipefds[1]);
-    }
-  else if (child_pid == 0)
-    {
+      errno = EMFILE;
+      break;
+
+    case 0:
       close (pipefds[0]);
       dup2_or_die (tempfd, STDIN_FILENO);
       close (tempfd);
       dup2_or_die (pipefds[1], STDOUT_FILENO);
       close (pipefds[1]);

-      if (execlp (compress_program, compress_program, "-d", (char *) NULL) < 0)
-       error (SORT_FAILURE, errno, _("couldn't execute %s -d"),
-              compress_program);
-    }
-  else
-    error (SORT_FAILURE, errno, _("couldn't create process for %s -d"),
-          compress_program);
+      execlp (compress_program, compress_program, "-d", (char *) NULL);
+      error (SORT_FAILURE, errno, _("couldn't execute %s -d"),
+            compress_program);

-  fp = fdopen (pipefds[0], "r");
-  if (! fp)
-    die (_("couldn't create temporary file"), name);
+    default:
+      close (tempfd);
+      close (pipefds[1]);
+
+      fp = fdopen (pipefds[0], "r");
+      if (! fp)
+       {
+         int saved_errno = errno;
+         close (pipefds[0]);
+         errno = saved_errno;
+       }
+      break;
+    }

   return fp;
 }
@@ -2148,31 +2189,53 @@ check (char const *file_name, char checkonly)
   return ordered;
 }

+/* Open FILES (there are NFILES of them) and store the resulting array
+   of stream pointers into (*PFPS).  Allocate the array.  Return the
+   number of successfully opened files, setting errno if this value is
+   less than NFILES.  */
+
+static size_t
+open_input_files (struct sortfile *files, size_t nfiles, FILE ***pfps)
+{
+  FILE **fps = *pfps = xnmalloc (nfiles, sizeof *fps);
+  int i;
+
+  /* Open as many input files as we can.  */
+  for (i = 0; i < nfiles; i++)
+    {
+      fps[i] = (files[i].pid
+               ? open_temp (files[i].name, files[i].pid)
+               : stream_open (files[i].name, "r"));
+      if (!fps[i])
+       break;
+    }
+
+  return i;
+}
+
 /* Merge lines from FILES onto OFP.  NTEMPS is the number of temporary
    files (all of which are at the start of the FILES array), and
    NFILES is the number of files; 0 <= NTEMPS <= NFILES <= NMERGE.
-   Close input and output files before returning.
+   FPS is the vector of open stream corresponding to the files.
+   Close input and output streams before returning.
    OUTPUT_FILE gives the name of the output file.  If it is NULL,
-   the output file is standard output.  If OFP is NULL, the output
-   file has not been opened yet (or written to, if standard output).  */
+   the output file is standard output.  */

 static void
 mergefps (struct sortfile *files, size_t ntemps, size_t nfiles,
-         FILE *ofp, char const *output_file)
+         FILE *ofp, char const *output_file, FILE **fps)
 {
-  FILE **fps = xnmalloc (nmerge, sizeof *fps);
-                               /* Input streams for each file.  */
-  struct buffer *buffer = xnmalloc (nmerge, sizeof *buffer);
+  struct buffer *buffer = xnmalloc (nfiles, sizeof *buffer);
                                /* Input buffers for each file. */
   struct line saved;           /* Saved line storage for unique check. */
   struct line const *savedline = NULL;
                                /* &saved if there is a saved line. */
   size_t savealloc = 0;                /* Size allocated for the saved line. */
-  struct line const **cur = xnmalloc (nmerge, sizeof *cur);
+  struct line const **cur = xnmalloc (nfiles, sizeof *cur);
                                /* Current line in each line table. */
-  struct line const **base = xnmalloc (nmerge, sizeof *base);
+  struct line const **base = xnmalloc (nfiles, sizeof *base);
                                /* Base of each line table.  */
-  size_t *ord = xnmalloc (nmerge, sizeof *ord);
+  size_t *ord = xnmalloc (nfiles, sizeof *ord);
                                /* Table representing a permutation of fps,
                                   such that cur[ord[0]] is the smallest line
                                   and will be next output. */
@@ -2185,9 +2248,6 @@ mergefps (struct sortfile *files, size_t ntemps, size_t 
nfiles,
   /* Read initial lines from each input file. */
   for (i = 0; i < nfiles; )
     {
-      fps[i] = (files[i].pid
-               ? open_temp (files[i].name, files[i].pid)
-               : xfopen (files[i].name, "r"));
       initbuf (&buffer[i], sizeof (struct line),
               MAX (merge_buffer_size, sort_size / nfiles));
       if (fillbuf (&buffer[i], fps[i], files[i].name))
@@ -2209,13 +2269,13 @@ mergefps (struct sortfile *files, size_t ntemps, size_t 
nfiles,
          free (buffer[i].buf);
          --nfiles;
          for (j = i; j < nfiles; ++j)
-           files[j] = files[j + 1];
+           {
+             files[j] = files[j + 1];
+             fps[j] = fps[j + 1];
+           }
        }
     }

-  if (! ofp)
-    ofp = xfopen (output_file, "w");
-
   /* Set up the ord table according to comparisons among input lines.
      Since this only reorders two items if one is strictly greater than
      the other, it is stable. */
@@ -2353,6 +2413,28 @@ mergefps (struct sortfile *files, size_t ntemps, size_t 
nfiles,
   free(cur);
 }

+/* Merge lines from FILES onto OFP.  NTEMPS is the number of temporary
+   files (all of which are at the start of the FILES array), and
+   NFILES is the number of files; 0 <= NTEMPS <= NFILES <= NMERGE.
+   Close input and output files before returning.
+   OUTPUT_FILE gives the name of the output file.
+
+   Return the number of files successfully merged.  This number can be
+   less than NFILES if we ran low on file descriptors, but in this
+   case it is never less than 2.  */
+
+static size_t
+mergefiles (struct sortfile *files, size_t ntemps, size_t nfiles,
+           FILE *ofp, char const *output_file)
+{
+  FILE **fps;
+  size_t nopened = open_input_files (files, nfiles, &fps);
+  if (nopened < nfiles && nopened < 2)
+    die (_("open failed"), files[nopened].name);
+  mergefps (files, ntemps, nopened, ofp, output_file, fps);
+  return nopened;
+}
+
 /* Merge into T the two sorted arrays of lines LO (with NLO members)
    and HI (with NHI members).  T, LO, and HI point just past their
    respective arrays, and the arrays are in reverse order.  NLO and
@@ -2519,10 +2601,19 @@ avoid_trashing_input (struct sortfile *files, size_t 
ntemps,
          FILE *tftp;
          pid_t pid;
          char *temp = create_temp (&tftp, &pid);
-         mergefps (&files[i],0, nfiles - i, tftp, temp);
-         files[i].name = temp;
-         files[i].pid = pid;
-         return i + 1;
+         size_t num_merged = 0;
+         while (i + num_merged < nfiles)
+           {
+             num_merged += mergefiles (&files[i], 0, nfiles - i, tftp, temp);
+             files[i].name = temp;
+             files[i].pid = pid;
+
+             memmove(&files[i], &files[i + num_merged],
+                     num_merged * sizeof *files);
+             ntemps += 1;
+             nfiles -= num_merged - 1;;
+             i += num_merged;
+           }
        }
     }

@@ -2553,17 +2644,20 @@ merge (struct sortfile *files, size_t ntemps, size_t 
nfiles,
       /* Number of easily-available slots at the next loop iteration.  */
       size_t cheap_slots;

-      /* Do as many NMERGE-size merges as possible.  */
-      for (out = in = 0; out < nfiles / nmerge; out++, in += nmerge)
+      /* Do as many NMERGE-size merges as possible. In the case that
+         nmerge is bogus, increment by the maximum number of file
+         descriptors allowed.  */
+      for (out = in = 0; nmerge <= nfiles - in; out++)
        {
          FILE *tfp;
          pid_t pid;
          char *temp = create_temp (&tfp, &pid);
-         size_t nt = MIN (ntemps, nmerge);
-         ntemps -= nt;
-         mergefps (&files[in], nt, nmerge, tfp, temp);
+         size_t num_merged = mergefiles (&files[in], MIN (ntemps, nmerge),
+                                         nmerge, tfp, temp);
+         ntemps -= MIN (ntemps, num_merged);
          files[out].name = temp;
          files[out].pid = pid;
+         in += num_merged;
        }

       remainder = nfiles - in;
@@ -2578,12 +2672,12 @@ merge (struct sortfile *files, size_t ntemps, size_t 
nfiles,
          FILE *tfp;
          pid_t pid;
          char *temp = create_temp (&tfp, &pid);
-         size_t nt = MIN (ntemps, nshortmerge);
-         ntemps -= nt;
-         mergefps (&files[in], nt, nshortmerge, tfp, temp);
+         size_t num_merged = mergefiles (&files[in], MIN (ntemps, nshortmerge),
+                                         nshortmerge, tfp, temp);
+         ntemps -= MIN (ntemps, num_merged);
          files[out].name = temp;
          files[out++].pid = pid;
-         in += nshortmerge;
+         in += num_merged;
        }

       /* Put the remaining input files into the last NMERGE-sized output
@@ -2594,7 +2688,57 @@ merge (struct sortfile *files, size_t ntemps, size_t 
nfiles,
     }

   nfiles = avoid_trashing_input (files, ntemps, nfiles, output_file);
-  mergefps (files, ntemps, nfiles, NULL, output_file);
+
+  /* We aren't guaranteed that this final mergefiles will work, therefore we
+     try to merge into the output, and then merge as much as we can into a
+     temp file if we can't. Repeat.  */
+
+  for (;;)
+    {
+      /* Merge directly into the output file if possible.  */
+      FILE **fps;
+      size_t nopened = open_input_files (files, nfiles, &fps);
+
+      if (nopened == nfiles)
+       {
+         FILE *ofp = stream_open (output_file, "w");
+         if (ofp)
+           {
+             mergefps (files, ntemps, nfiles, ofp, output_file, fps);
+             break;
+           }
+         if (errno != EMFILE || nopened <= 2)
+           die (_("open failed"), output_file);
+       }
+      else if (nopened <= 2)
+       die (_("open failed"), files[nopened].name);
+
+      /* We ran out of file descriptors.  Close one of the input
+        files, to gain a file descriptor.  Then create a temporary
+        file with our spare file descriptor.  Retry if that failed
+        (e.g., some other process could open a file between the time
+        we closed and tried to create).  */
+      FILE *tfp;
+      pid_t pid;
+      char *temp;
+      do
+       {
+         nopened--;
+         xfclose (fps[nopened], files[nopened].name);
+         temp = maybe_create_temp (&tfp, &pid, ! (nopened <= 2));
+       }
+      while (!temp);
+
+      /* Merge into the newly allocated temporary.  */
+      mergefps (&files[0], MIN (ntemps, nopened), nopened, tfp, temp, fps);
+      ntemps -= MIN (ntemps, nopened);
+      files[0].name = temp;
+      files[0].pid = pid;
+
+      memmove (&files[1], &files[nopened], (nfiles - nopened) * sizeof *files);
+      ntemps++;
+      nfiles -= nopened - 1;
+    }
 }

 /* Sort NFILES FILES onto OUTPUT_FILE. */
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 10be0c6..2fb01c4 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -201,6 +201,7 @@ TESTS =                                             \
   misc/shuf                                    \
   misc/sort                                    \
   misc/sort-compress                           \
+  misc/sort-continue                           \
   misc/sort-files0-from                                \
   misc/sort-merge                              \
   misc/sort-rand                               \
diff --git a/tests/misc/sort-continue b/tests/misc/sort-continue
new file mode 100755
index 0000000..b94cb2a
--- /dev/null
+++ b/tests/misc/sort-continue
@@ -0,0 +1,47 @@
+#!/bin/bash
+# Tests for file descriptor exhaustion.
+
+# Copyright (C) 2009 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+if test "$VERBOSE" = yes; then
+  set -x
+  sort --version
+fi
+
+. $srcdir/test-lib.sh
+
+for i in `seq 31`; do
+  echo $i | tee -a in > __test.$i || framework_failure
+done
+
+fail=0
+
+(
+ ulimit -n 6
+ exec 0</dev/null 3<&- 4<&- 5<&-
+ sort -n -m __test.* > out
+) &&
+compare in out || { fail=1; echo "file descriptor exhaustion not handled" 
1>&2; }
+
+echo "32" | tee -a in > in1
+(
+ ulimit -n 6
+ exec 3<&- 4<&- 5<&-
+ cat in1 | sort -n -m __test.* - > out
+) &&
+compare in out || { fail=1; echo "stdin not handled properly" 1>&2; }
+
+Exit $fail
--
1.6.2.rc1.285.gc5f54


>From dbc8690bc5cbba66737a8c88b4a643f3e6645948 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyer...@redhat.com>
Date: Tue, 17 Mar 2009 18:17:09 +0100
Subject: [PATCH 2/4] tweak test

---
 tests/misc/sort-continue |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/misc/sort-continue b/tests/misc/sort-continue
index b94cb2a..b1031fa 100755
--- a/tests/misc/sort-continue
+++ b/tests/misc/sort-continue
@@ -23,7 +23,7 @@ fi

 . $srcdir/test-lib.sh

-for i in `seq 31`; do
+for i in $(seq 31); do
   echo $i | tee -a in > __test.$i || framework_failure
 done

@@ -34,14 +34,14 @@ fail=0
  exec 0</dev/null 3<&- 4<&- 5<&-
  sort -n -m __test.* > out
 ) &&
-compare in out || { fail=1; echo "file descriptor exhaustion not handled" 
1>&2; }
+compare in out || { fail=1; echo 'file descriptor exhaustion not handled' 
1>&2; }

-echo "32" | tee -a in > in1
+echo 32 | tee -a in > in1
 (
  ulimit -n 6
  exec 3<&- 4<&- 5<&-
  cat in1 | sort -n -m __test.* - > out
 ) &&
-compare in out || { fail=1; echo "stdin not handled properly" 1>&2; }
+compare in out || { fail=1; echo 'stdin not handled properly' 1>&2; }

 Exit $fail
--
1.6.2.rc1.285.gc5f54


>From 526a803a4cebfe097a517a91fdc7ac40cd2b0168 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Mon, 9 Mar 2009 14:56:13 -0700
Subject: [PATCH 3/4] tests: add another sort/nmerge test

* tests/Makefile.am (TESTS): Add sort-merge-fdlimit.
* tests/misc/sort-merge-fdlimit: New file.
* doc/coreutils.texi (sort invocation): Document that we now silently
lower nmerge if necessary.
Patch by Paul Eggert, Nima Nikzad, Max Chang, Alexander Nguyen,
Sahil Amoli, and Nick Graham.
---
 THANKS                        |    5 ++++
 doc/coreutils.texi            |   14 +++++++----
 tests/Makefile.am             |    1 +
 tests/misc/sort-merge-fdlimit |   54 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 69 insertions(+), 5 deletions(-)
 create mode 100755 tests/misc/sort-merge-fdlimit

diff --git a/THANKS b/THANKS
index 3b933ff..d4b2d61 100644
--- a/THANKS
+++ b/THANKS
@@ -22,6 +22,7 @@ Albert Hopkins                      ahopk...@dynacare.com
 Alberto Accomazzi                   albe...@cfa0.harvard.edu
 aldomel                             aldo...@ix.netcom.com
 Alen Muzinic                        zv...@fly.cc.fer.hr
+Alexander Nguyen                    v...@seas.ucla.edu
 Alexander V. Lukyanov               l...@netis.ru
 Allen Hewes                         al...@decisiv.net
 Axel Dörfler                        ax...@pinc-software.de
@@ -380,6 +381,7 @@ Matthew Woehlke                     
mw_tr...@users.sourceforge.net
 Matthias Urlichs                    sm...@noris.de
 Matti Aarnio                        matti.aar...@zmailer.org
 Mattias Wadenstein                  mas...@acc.umu.se
+Max Chang                           maxch...@ucla.edu
 Meelis Roos                         mr...@tartu.cyber.ee
 Michael                             mich...@aplatform.com
 Michael ???                         mich...@roka.net
@@ -417,11 +419,13 @@ Neal H Walfield                     n...@cs.uml.edu
 Neil Brown                          ne...@cse.unsw.edu.au
 Nelson H. F. Beebe                  be...@math.utah.edu
 Nick Estes                          deb...@nickstoys.com
+Nick Graham                         nick.d.gra...@gmail.com
 Nick Lawes                          nla...@silverplatter.com
 Nickolai Zeldovich                  nicko...@cs.stanford.edu
 Nicolas François                    nicolas.franc...@centraliens.net
 Niklas Edmundsson                   ni...@acc.umu.se
 Nikola Milutinovic                  nikola.milutino...@ev.co.yu
+Nima Nikzad                         nnik...@ucla.edu
 Noah Friedman                       fried...@splode.com
 Noel Cragg                          n...@red-bean.com
 Norbert Kiesel                      nkie...@tbdnetworks.com
@@ -494,6 +498,7 @@ Ross Alexander                      
r.alexan...@auckland.ac.nz
 Ross Paterson                       r...@doc.ic.ac.uk
 Ross Ridge                          rri...@calum.csclub.uwaterloo.ca
 Rudolf Kastl                        rka...@redhat.com
+Sahil Amoli                         sahilam...@gmail.com
 Sami Farin                          sfa...@ratol.fi
 Samuel Tardieu                      s...@rfc1149.net
 Samuel Thibault                     samuel.thiba...@ens-lyon.org
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 6f8c197..04db676 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -3937,13 +3937,17 @@ sort invocation
 requirements and I/0 at the expense of temporary storage consumption and
 merge performance.

-The value of @var{nmerge} must be at least 2.
+The value of @var{nmerge} must be at least 2.  The default value is
+currently 16, but this is implementation-dependent and may change in
+the future.

 The value of @var{nmerge} may be bounded by a resource limit for open
-file descriptors.  Try @samp{ulimit -n} or @samp{getconf OPEN_MAX} to
-to display the limit for a particular system.
-If the value of @var{nmerge} exceeds this limit, then @command{sort} will
-issue a warning to standard error and exit with a nonzero status.
+file descriptors.  The commands @samp{ulimit -n} or @samp{getconf
+OPEN_MAX} may display limits for your systems; these limits may be
+modified further if your program already has some files open, or if
+the operating system has other limits on the number of open files.  If
+the value of @var{nmerge} exceeds the resource limit, @command{sort}
+silently uses a smaller value.

 @item -o @var{output-file}
 @itemx --outp...@var{output-file}
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 2fb01c4..3a15a87 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -204,6 +204,7 @@ TESTS =                                             \
   misc/sort-continue                           \
   misc/sort-files0-from                                \
   misc/sort-merge                              \
+  misc/sort-merge-fdlimit                      \
   misc/sort-rand                               \
   misc/sort-version                            \
   misc/split-a                                 \
diff --git a/tests/misc/sort-merge-fdlimit b/tests/misc/sort-merge-fdlimit
new file mode 100755
index 0000000..82305be
--- /dev/null
+++ b/tests/misc/sort-merge-fdlimit
@@ -0,0 +1,54 @@
+#!/bin/sh
+# Test whether sort avoids opening more file descriptors than it is
+# allowed when merging files.
+
+# Copyright (C) 2009 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+if test "$VERBOSE" = yes; then
+  set -x
+  sort --version
+fi
+
+. $srcdir/test-lib.sh
+require_ulimit_
+
+mkdir in err || framework_failure
+
+fail=0
+
+for i in `seq 17`; do
+  echo $i >in/$i
+done
+
+# When these tests are run inside the automated testing framework, they
+# have one less available file descriptor than when run outside the
+# automated testing framework. If a test with a batch size of b fails
+# inside the ATF, then the same test with batch size b+1 may pass outside
+# the ATF but fail inside it.
+
+# The default batch size (nmerge) is 16.
+(ulimit -n 19 \
+   && sort -m --batch-size=16 in/* 2>err/merge-default-err \
+   || ! grep "open failed" err/merge-default-err) || fail=1
+
+# If sort opens a file (/dev/urandom) to sort by random hashes of keys,
+# it needs to consider this file against its limit on open file
+# descriptors.
+(ulimit -n 20 \
+   && sort -mR --batch-size=16 in/* 2>err/merge-random-err \
+   || ! grep "open failed" err/merge-random-err) || fail=1
+
+Exit $fail
--
1.6.2.rc1.285.gc5f54


>From 4456212bceb21e1b22c750ea6d9beabb6e82b6bc Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyer...@redhat.com>
Date: Tue, 17 Mar 2009 21:23:51 +0100
Subject: [PATCH 4/4] tests: adjust sort-continue not to fail under valgrind

* tests/misc/sort-continue: Don't run cat inside fd-limited shell.
If sort fails to run in an fd-limited shell, skip the test.
---
 tests/misc/sort-continue |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/tests/misc/sort-continue b/tests/misc/sort-continue
index b1031fa..1b0ef43 100755
--- a/tests/misc/sort-continue
+++ b/tests/misc/sort-continue
@@ -23,6 +23,10 @@ fi

 . $srcdir/test-lib.sh

+# Skip the test when running under valgrind.
+( ulimit -n 6; sort < /dev/null ) \
+  || skip_test_ 'fd-limited sort failed; are you running under valgrind?'
+
 for i in $(seq 31); do
   echo $i | tee -a in > __test.$i || framework_failure
 done
@@ -40,7 +44,7 @@ echo 32 | tee -a in > in1
 (
  ulimit -n 6
  exec 3<&- 4<&- 5<&-
- cat in1 | sort -n -m __test.* - > out
+ sort -n -m __test.* - < in1 > out
 ) &&
 compare in out || { fail=1; echo 'stdin not handled properly' 1>&2; }

--
1.6.2.rc1.285.gc5f54


_______________________________________________
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: [PATCH] Change 'sort' to handle fd exhaustion better.

Reply via email to