On 03/01/2012 07:44 PM, Jérémy Compostella wrote:
> Hi Pádraig,
> 
> Finally, I found time to work again on this patch. I provided it as
> attachment.
> 
> I'm not completely satisfied with the documentation part. I've tried to
> be more specific but it becomes quickly complicated. So I get back the
> original explanation which does not completely satisfied me since it
> does not explain the following point:
> - When -a is not specified:
>   - output file names are considered exhausted when the first suffix
>     character will become 'z'.
>   -'z[a-z]' suffixes are never used
>   - the fact that suffix length is increased each time the suffix are
>   "exhausted".
> But maybe it's not relevant to be so specific.
> 
> What is your opinion about this point ?

Well documentation can easily include too much detail.
But one should try to clarify any ambiguities
a user might be considering.

I adjusted that documentation as follows:

@@ -2990,11 +2990,15 @@ The output files' names consist of @var{prefix} (@samp{>
 followed by a group of characters (@samp{aa}, @samp{ab}, @dots{} by
 default), such that concatenating the output files in traditional
 sorted order by file name produces the original input file (except
-@option{-nr/@var{n}}). If more than 676 output files are required and
-@samp{-a} is not specified, @command{split} uses @samp{zaaa},
-@samp{zaab}, etc.  If the @option{-a} is specified and the output file
-names are exhausted, @command{split} reports an error without deleting
-the output files that it did create.
+@option{-nr/@var{n}}).  By default split will initially create files
+with two generated suffix characters, and will increase this width by two
+when the next most significant position reaches the last character.
+(@samp{yz}, @samp{zaaa}, @dots{}).  In this way an arbitrary number
+of output files are supported which sort as described above,
+even in the presence of an @option{--additional-suffix} option.
+If the @option{-a} option is specified and the output file names are exhausted,
+@command{split} reports an error without deleting the output files
+that it did create.

I made a few adjustments to the code also.

Disable this feature if --numeric=FROM is specified,
because these suffixes are not all consecutive
and hence we can't support arbitrary FROM values.

Disable this feature if a specific number of
files is specified with the -n option.
We already auto select the width appropriately.

I could find no logic flaws in the implementation in new_file_name().
Well there was one nit with:

  sufindex = xrealloc (sufindex, suffix_length * sizeof *sufindex);

That should be as follows to avoid overflow issues:

  sufindex = xnrealloc (sufindex, suffix_length, sizeof *sufindex);

While looking at that I noticed a fair bit of overlap
from the existing fixed width suffix initialization,
so I merged the two with a goto.  That brought the
new implementation down from 15 to 8 significant lines,
and those 8 are now mostly simple integer width adjustments.
Also by doing this we get to take advantage of the _PC_NAME_MAX
check in this section.

I also added a test.

Full patch is attached.
I hope to apply this sometime tomorrow.

cheers,
Pádraig.
>From 9564ee6960f08fc4086e3f0eb238131f3f7ae159 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=A9my=20Compostella?=
 <[email protected]>
Date: Thu, 1 Mar 2012 20:37:41 +0100
Subject: [PATCH] split: support an arbitrary number of split files by default

* src/split.c (next_file_name): If `suffix_auto' is true and the first
suffix character is 'z', generate a new file file name adding `z' to
the prefix and increasing the suffix length by one.
(set_suffix_length): Disable auto suffix width in various cases.
* tests/split/suffix-auto-length: Test it.
* doc/coreutils.texi (split invocation): Mention it.
* NEWS (Improvements): Likewise.
---
 NEWS                           |    2 +
 doc/coreutils.texi             |   12 ++++++--
 src/split.c                    |   60 ++++++++++++++++++++++++++++++++++++----
 tests/Makefile.am              |    1 +
 tests/split/suffix-auto-length |   42 ++++++++++++++++++++++++++++
 5 files changed, 108 insertions(+), 9 deletions(-)
 create mode 100755 tests/split/suffix-auto-length

diff --git a/NEWS b/NEWS
index 7f36dc6..87ef7bd 100644
--- a/NEWS
+++ b/NEWS
@@ -60,6 +60,8 @@ GNU coreutils NEWS                                    -*- 
outline -*-
   systems for which getfilecon-, ACL-check- and XATTR-check-induced syscalls
   fail with ENOTSUP or similar.
 
+  split now supports an unlimited number of split files as default behavior.
+
 
 * Noteworthy changes in release 8.15 (2012-01-06) [stable]
 
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index cce9432..154bcfe 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -2990,9 +2990,15 @@ The output files' names consist of @var{prefix} 
(@samp{x} by default)
 followed by a group of characters (@samp{aa}, @samp{ab}, @dots{} by
 default), such that concatenating the output files in traditional
 sorted order by file name produces the original input file (except
-@option{-nr/@var{n}}).  If the output file names are exhausted,
-@command{split} reports an error without deleting the output files
-that it did create.
+@option{-nr/@var{n}}).  By default split will initially create files
+with two generated suffix characters, and will increase this width by two
+when the next most significant position reaches the last character.
+(@samp{yz}, @samp{zaaa}, @samp{zaab}, @dots{}).  In this way an arbitrary
+number of output files are supported, which sort as described above,
+even in the presence of an @option{--additional-suffix} option.
+If the @option{-a} option is specified and the output file names are
+exhausted, @command{split} reports an error without deleting the
+output files that it did create.
 
 The program accepts the following options.  Also see @ref{Common options}.
 
diff --git a/src/split.c b/src/split.c
index 68c9a34..ca5be9f 100644
--- a/src/split.c
+++ b/src/split.c
@@ -74,6 +74,9 @@ static char *outfile;
    Suffixes are inserted here.  */
 static char *outfile_mid;
 
+/* Generate new suffix when suffixes are exhausted.  */
+static bool suffix_auto = true;
+
 /* Length of OUTFILE's suffix.  */
 static size_t suffix_length;
 
@@ -155,6 +158,12 @@ set_suffix_length (uintmax_t n_units, enum Split_type 
split_type)
 
   size_t suffix_needed = 0;
 
+  /* The suffix auto length feature is incompatible with
+     a user specified start value as the generated suffixes
+     are not all consecutive.  */
+  if (numeric_suffix_start)
+    suffix_auto = false;
+
   /* Auto-calculate the suffix length if the number of files is given.  */
   if (split_type == type_chunk_bytes || split_type == type_chunk_lines
       || split_type == type_rr)
@@ -164,6 +173,7 @@ set_suffix_length (uintmax_t n_units, enum Split_type 
split_type)
       while (n_units /= alphabet_len)
         suffix_needed++;
       suffix_needed += alphabet_slop;
+      suffix_auto = false;
     }
 
   if (suffix_length)            /* set by user */
@@ -174,6 +184,7 @@ set_suffix_length (uintmax_t n_units, enum Split_type 
split_type)
                  _("the suffix length needs to be at least %zu"),
                  suffix_needed);
         }
+      suffix_auto = false;
       return;
     }
   else
@@ -242,27 +253,61 @@ next_file_name (void)
 {
   /* Index in suffix_alphabet of each character in the suffix.  */
   static size_t *sufindex;
+  static size_t outbase_length;
+  static size_t outfile_length;
+  static size_t addsuf_length;
 
   if (! outfile)
     {
-      /* Allocate and initialize the first file name.  */
+      bool widen;
+
+new_name:
+      widen = !! outfile_length;
+
+      if (! widen)
+        {
+          /* Allocate and initialize the first file name.  */
+
+          outbase_length = strlen (outbase);
+          addsuf_length = additional_suffix ? strlen (additional_suffix) : 0;
+          outfile_length = outbase_length + suffix_length + addsuf_length;
+        }
+      else
+        {
+          /* Reallocate and initialize a new wider file name.
+             We do this by subsuming the unchanging part of
+             the generated suffix into the prefix (base), and
+             reinitializing the now one longer suffix.  */
+
+          outfile_length += 2;
+          suffix_length++;
+        }
 
-      size_t outbase_length = strlen (outbase);
-      size_t addsuf_length = additional_suffix ? strlen (additional_suffix) : 
0;
-      size_t outfile_length = outbase_length + suffix_length + addsuf_length;
       if (outfile_length + 1 < outbase_length)
         xalloc_die ();
-      outfile = xmalloc (outfile_length + 1);
+      outfile = xrealloc (outfile, outfile_length + 1);
+
+      if (! widen)
+        memcpy (outfile, outbase, outbase_length);
+      else
+        {
+          /* Append the last alphabet character to the file name prefix.  */
+          outfile[outbase_length] = suffix_alphabet[sufindex[0]];
+          outbase_length++;
+        }
+
       outfile_mid = outfile + outbase_length;
-      memcpy (outfile, outbase, outbase_length);
       memset (outfile_mid, suffix_alphabet[0], suffix_length);
       if (additional_suffix)
         memcpy (outfile_mid + suffix_length, additional_suffix, addsuf_length);
       outfile[outfile_length] = 0;
+      free (sufindex);
       sufindex = xcalloc (suffix_length, sizeof *sufindex);
 
       if (numeric_suffix_start)
         {
+          assert (! widen);
+
           /* Update the output file name.  */
           size_t i = strlen (numeric_suffix_start);
           memcpy (outfile_mid + suffix_length - i, numeric_suffix_start, i);
@@ -295,12 +340,15 @@ next_file_name (void)
       while (i-- != 0)
         {
           sufindex[i]++;
+          if (suffix_auto && i == 0 && ! suffix_alphabet[sufindex[0] + 1])
+            goto new_name;
           outfile_mid[i] = suffix_alphabet[sufindex[i]];
           if (outfile_mid[i])
             return;
           sufindex[i] = 0;
           outfile_mid[i] = suffix_alphabet[sufindex[i]];
         }
+
       error (EXIT_FAILURE, 0, _("output file suffixes exhausted"));
     }
 }
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 5e184ac..d7a1837 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -253,6 +253,7 @@ TESTS =                                             \
   misc/sort-version                            \
   misc/sort-NaN-infloop                                \
   split/filter                                 \
+  split/suffix-auto-length                     \
   split/suffix-length                          \
   split/additional-suffix                      \
   split/b-chunk                                        \
diff --git a/tests/split/suffix-auto-length b/tests/split/suffix-auto-length
new file mode 100755
index 0000000..dacc951
--- /dev/null
+++ b/tests/split/suffix-auto-length
@@ -0,0 +1,42 @@
+#!/bin/sh
+# Test the suffix auto widening functionality
+
+# Copyright (C) 2012 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+print_ver_ split
+
+
+# ensure this feature is off when start number specified
+truncate -s12 file.in
+split file.in -b1 --numeric=89 && fail=1
+test "$(ls -1 x* | wc -l)" = 11 || fail=1
+rm -f x*
+
+# ensure this feature works when no start num specified
+truncate -s91 file.in
+for prefix in 'x' 'xx' ''; do
+    for add_suffix in '.txt' ''; do
+      split file.in "$prefix" -b1 --numeric --additional-suffix="$add_suffix" \
+        || fail=1
+      test "$(ls -1 $prefix*[0-9]*$add_suffix | wc -l)" = 91 || fail=1
+      test -e "${prefix}89$add_suffix" || fail=1
+      test -e "${prefix}9000$add_suffix" || fail=1
+      rm -f $prefix*[0-9]*$add_suffix
+    done
+done
+
+Exit $fail
-- 
1.7.6.4

Reply via email to