Your message dated Sun, 3 Apr 2016 14:22:12 +0000
with message-id <[email protected]>
and subject line Re: large exclusions unacceptably slow
has caused the Debian Bug report #221482,
regarding large exclusions unacceptably slow
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
221482: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=221482
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: tar
Version: 1.13.25-2

Tar will consume inordinate amounts of cpu time if you use the
--exclude-from=file with a large file or many --exclude options. The time
spent winds up being O(potential_archive_members * total_exclusions), which
for any meaningful number of exclusions is a very, very big number. For
example, with an 8MB exclude list containing aprox 90,000 items, tar runs at
a rate of about 1 file per second on a 1400mhz pIII with 512KB CPU cache!
Even if my tape drive and disks were infinitely fast, it would still take
aproximately 43 hours for tar to back up my system with 158205 files.

The attached patch causes tar to use a hashed data structure to store the
exclusions, resulting in acceptable performance. See
http://www.proudman51.freeserve.co.uk/tar.html

-- 
Brian Ristuccia
diff --new-file -r -U 5 tar-1.13.25/ChangeLog tar-1.13.25pp/ChangeLog
--- tar-1.13.25/ChangeLog       Wed Sep 26 16:54:08 2001
+++ tar-1.13.25pp/ChangeLog     Fri May  3 20:34:07 2002
@@ -1,5 +1,34 @@
+
+2002-05-04  Phil Proudman <[email protected]>, Mark Oakden 
<[email protected]>
+
+       * lib/excludefast.c: New file, implements hashed exclusions
+       
+       * lib/excludefast.h: As above
+       
+       * lib/exclude.c: Changed to use excludefast whenever possible
+       and appropriate to do so. This offers a great speed improvement.
+       Note that a different hash must be used for each different
+       optionflag setting that is used.
+
+       * lib/exclude.h: Removed unused parameter from add_exclude_file
+
+       * lib/Makefile.in: Added excludefast files
+       
+       * src/tar.c: Changed "--anchored" flag to be default, this now
+       agrees with the docs. Fixed OBSOLETE_VERSION_CONTROL fallthrough.
+       Made the EXCLUDE_INCLUDE flag accessible to the user.
+       
+       * tests/exclude01.sh: New test for --no-wildcards
+       * tests/exclude02.sh: New test for directory exclusion
+       * tests/exclude03.sh: New test for filename simplification
+       * tests/exclude04.sh: New test for --no-anchored
+       * tests/exclude05.sh: New test for --anchored
+       * tests/exclude06.sh: New test for --ignore-case
+       * tests/exclude07.sh: New test for --exclude-include
+       * tests/Makefile.in: Added new tests
+       
 2001-09-26  Paul Eggert  <[email protected]>
 
        * NEWS, configure.ac (AM_INIT_AUTOMAKE): Version 1.13.25.
 
        * src/buffer.c (flush_read): Don't diagnose partial blocks before
diff --new-file -r -U 5 tar-1.13.25/lib/Makefile.in 
tar-1.13.25pp/lib/Makefile.in
--- tar-1.13.25/lib/Makefile.in Wed Sep 26 16:34:09 2001
+++ tar-1.13.25pp/lib/Makefile.in       Fri May  3 06:22:16 2002
@@ -117,20 +117,22 @@
 noinst_HEADERS = \
   argmatch.h backupfile.h dirname.h error.h exclude.h \
   fnmatch.hin full-write.h getdate.h getline.h getopt.h getstr.h \
   hash.h human.h lchown.h modechange.h prepargs.h print-copyr.h \
   quote.h quotearg.h \
-  save-cwd.h savedir.h safe-read.h unicodeio.h xalloc.h xstrtol.h
+  save-cwd.h savedir.h safe-read.h unicodeio.h xalloc.h xstrtol.h \
+  excludefast.h
 
 
 libtar_a_SOURCES = \
   addext.c argmatch.c backupfile.c basename.c dirname.c error.c \
   exclude.c full-write.c getdate.y getopt.c getopt1.c getstr.c \
   hash.c human.c modechange.c msleep.c prepargs.c print-copyr.c \
   quote.c quotearg.c safe-read.c save-cwd.c savedir.c \
   unicodeio.c xgetcwd.c xmalloc.c xstrdup.c \
-  xstrtoimax.c xstrtoul.c xstrtoumax.c
+  xstrtoimax.c xstrtoul.c xstrtoumax.c \
+  excludefast.c
 
 
 INCLUDES = -I../intl
 
 libtar_a_LIBADD = @ALLOCA@ @LIBOBJS@
@@ -152,11 +154,11 @@
        modechange.$(OBJEXT) msleep.$(OBJEXT) prepargs.$(OBJEXT) \
        print-copyr.$(OBJEXT) quote.$(OBJEXT) quotearg.$(OBJEXT) \
        safe-read.$(OBJEXT) save-cwd.$(OBJEXT) savedir.$(OBJEXT) \
        unicodeio.$(OBJEXT) xgetcwd.$(OBJEXT) xmalloc.$(OBJEXT) \
        xstrdup.$(OBJEXT) xstrtoimax.$(OBJEXT) xstrtoul.$(OBJEXT) \
-       xstrtoumax.$(OBJEXT)
+       xstrtoumax.$(OBJEXT) excludefast.$(OBJEXT)
 libtar_a_OBJECTS = $(am_libtar_a_OBJECTS)
 
 DEFS = @DEFS@
 DEFAULT_INCLUDES =  -I. -I$(srcdir) -I$(top_builddir)
 CPPFLAGS = @CPPFLAGS@
@@ -186,11 +188,11 @@
 @AMDEP_TRUE@   $(DEPDIR)/strtoul.Po $(DEPDIR)/strtoull.Po \
 @AMDEP_TRUE@   $(DEPDIR)/strtoumax.Po $(DEPDIR)/unicodeio.Po \
 @AMDEP_TRUE@   $(DEPDIR)/waitpid.Po $(DEPDIR)/xgetcwd.Po \
 @AMDEP_TRUE@   $(DEPDIR)/xmalloc.Po $(DEPDIR)/xstrdup.Po \
 @AMDEP_TRUE@   $(DEPDIR)/xstrtoimax.Po $(DEPDIR)/xstrtoul.Po \
-@AMDEP_TRUE@   $(DEPDIR)/xstrtoumax.Po
+@AMDEP_TRUE@   $(DEPDIR)/xstrtoumax.Po $(DEPDIR)/excludefast.Po
 COMPILE = $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) \
        $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS)
 CCLD = $(CC)
 LINK = $(CCLD) $(AM_CFLAGS) $(CFLAGS) $(AM_LDFLAGS) $(LDFLAGS) -o $@
 CFLAGS = @CFLAGS@
@@ -238,10 +240,11 @@
 @AMDEP_TRUE@@am__include@ @am__quote@$(DEPDIR)/backupfile.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@$(DEPDIR)/basename.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@$(DEPDIR)/dirname.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@$(DEPDIR)/error.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@$(DEPDIR)/exclude.Po@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@$(DEPDIR)/excludefast.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@$(DEPDIR)/fileblocks.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@$(DEPDIR)/fnmatch.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@$(DEPDIR)/ftruncate.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@$(DEPDIR)/full-write.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@$(DEPDIR)/getdate.Po@am__quote@
diff --new-file -r -U 5 tar-1.13.25/lib/exclude.c tar-1.13.25pp/lib/exclude.c
--- tar-1.13.25/lib/exclude.c   Mon Sep  3 14:26:33 2001
+++ tar-1.13.25pp/lib/exclude.c Fri May  3 17:39:17 2002
@@ -54,10 +54,11 @@
 #  include <stdint.h>
 # endif
 #endif
 
 #include "exclude.h"
+#include "excludefast.h"
 #include "fnmatch.h"
 #include "xalloc.h"
 
 #ifndef SIZE_MAX
 # define SIZE_MAX ((size_t) -1)
@@ -76,28 +77,75 @@
    ORed with EXCLUDE_* options.  */
 
 struct patopts
   {
     char const *pattern;
+    int exclude_idx;
     int options;
   };
 
 /* An exclude list, of pattern-options pairs.  */
 
+struct excludefast_opt
+  {
+    struct excludefast* ef;
+    int options;
+    struct excludefast_opt* next;
+  };
+
+/* need a seperate excludefast hash for each different option setting */
+
 struct exclude
   {
+    struct excludefast_opt* nowildcards; 
+    
     struct patopts *exclude;
     size_t exclude_alloc;
     size_t exclude_count;
   };
 
+static void simplify_name_in_place(char* filename, int options) {
+  /* Removes any redundant characters to return an                   */
+  /* unambiguous path+file string according to the options.          */
+  /* Note that '../' patterns in the filename will not be simplified */
+  /* and such patterns should not be supplied to exclude             */
+  
+  char* get = filename;
+  if (! (options & FNM_LEADING_DIR))
+    {
+      /* Remove leading '/' or './' */
+      while (*get == '/' || (*get == '.' && *(get+1) == '/')) ++get;
+    }
+  
+  while (*get) {
+    /* Replace any '/./' or '//' with a single '/' */
+    while (*get && *filename == '/' &&
+           (*(get+1) == '/' ||
+           (*(get+1) == '.' && *(get+2) == '/'))) ++get;
+
+    /* Remove trailing '/' */
+    if (!*get) break;
+    if (*(get+1) == 0 && *get == '/') break;
+
+    /* Copy in-place */
+    if (options & FNM_CASEFOLD)
+      *filename = tolower(*get);
+    else
+      *filename = *get;
+    
+    ++filename; ++get;
+  }
+  *filename = 0;
+}
+
 /* Return a newly allocated and empty exclude list.  */
 
 struct exclude *
 new_exclude (void)
 {
   struct exclude *ex = (struct exclude *) xmalloc (sizeof *ex);
+  ex->nowildcards = 0;
   ex->exclude_count = 0;
   ex->exclude_alloc = (1 << 6); /* This must be a power of 2.  */
   ex->exclude = (struct patopts *) xmalloc (ex->exclude_alloc
                                            * sizeof ex->exclude[0]);
   return ex;
@@ -106,116 +154,174 @@
 /* Free the storage associated with an exclude list.  */
 
 void
 free_exclude (struct exclude *ex)
 {
-  free (ex->exclude);
-  free (ex);
-}
-
-/* Return zero if PATTERN matches F, obeying OPTIONS, except that
-   (unlike fnmatch) wildcards are disabled in PATTERN.  */
-
-static int
-fnmatch_no_wildcards (char const *pattern, char const *f, int options)
-{
-  if (! (options & FNM_LEADING_DIR))
-    return ((options & FNM_CASEFOLD)
-           ? strcasecmp (pattern, f)
-           : strcmp (pattern, f));
-  else
+  struct excludefast_opt* efo = ex->nowildcards;
+  while (efo)
     {
-      size_t patlen = strlen (pattern);
-      int r = ((options & FNM_CASEFOLD)
-               ? strncasecmp (pattern, f, patlen)
-               : strncmp (pattern, f, patlen));
-      if (! r)
-       {
-         r = f[patlen];
-         if (r == '/')
-           r = 0;
-       }
-      return r;
+      struct excludefast_opt* next = efo->next;
+      free_excludefast( efo->ef );
+      free(efo);
+      efo = next;
     }
+  free (ex->exclude);
+  free (ex);
 }
 
 /* Return true if EX excludes F.  */
 
 bool
 excluded_filename (struct exclude const *ex, char const *f)
 {
   size_t exclude_count = ex->exclude_count;
-
-  /* If no options are given, the default is to include.  */
-  if (exclude_count == 0)
-    return 0;
-  else
+  int max_exclude_idx = 0;
+  int options_used;
+  
+  /* Is this in any of our fast non-regex hashes? */
+  struct excludefast_opt* efo = ex->nowildcards;
+  while (efo)
+    {
+      int exclude_idx = excludedfast_filename( efo->ef, f, efo->options );
+      if (exclude_idx > max_exclude_idx)
+        {
+          max_exclude_idx = exclude_idx;
+          options_used = efo->options;
+        }
+      efo = efo->next;
+    }
+  
+  if (exclude_count > 0)
     {
       struct patopts const *exclude = ex->exclude;
-      size_t i;
-
-      /* Otherwise, the default is the opposite of the first option.  */
-      bool excluded = !! (exclude[0].options & EXCLUDE_INCLUDE);
+      int i;
 
       /* Scan through the options, seeing whether they change F from
         excluded to included or vice versa.  */
-      for (i = 0;  i < exclude_count;  i++)
+      for (i = exclude_count-1;  i >= 0;  --i)
        {
          char const *pattern = exclude[i].pattern;
          int options = exclude[i].options;
-         if (excluded == !! (options & EXCLUDE_INCLUDE))
+
+         if (exclude[i].exclude_idx > max_exclude_idx)
            {
-             int (*matcher) PARAMS ((char const *, char const *, int)) =
-               (options & EXCLUDE_WILDCARDS
-                ? fnmatch
-                : fnmatch_no_wildcards);
-             bool matched = ((*matcher) (pattern, f, options) == 0);
+             bool matched = ( fnmatch (pattern, f, options) == 0);
              char const *p;
 
              if (! (options & EXCLUDE_ANCHORED))
                for (p = f; *p && ! matched; p++)
                  if (*p == '/' && p[1] != '/')
-                   matched = ((*matcher) (pattern, p + 1, options) == 0);
+                   matched = ( fnmatch (pattern, p + 1, options) == 0);
 
-             excluded ^= matched;
+             if (matched)
+                {
+                  max_exclude_idx = exclude[i].exclude_idx;
+                  options_used = options;
+                  break;
+                }
            }
        }
-
-      return excluded;
     }
+
+  if ( max_exclude_idx > 0 ) /* Have found a match */
+      return !( options_used & EXCLUDE_INCLUDE ); /* was it an include or 
exclude */
+
+  return 0; /* Have found no match */
 }
 
 /* Append to EX the exclusion PATTERN with OPTIONS.  */
 
 void
 add_exclude (struct exclude *ex, char const *pattern, int options)
 {
-  struct patopts *patopts;
-
-  if (ex->exclude_alloc <= ex->exclude_count)
+  static int exclude_idx = 0;
+  ++exclude_idx;
+  
+  if (options & EXCLUDE_WILDCARDS)
     {
-      size_t s = 2 * ex->exclude_alloc;
-      if (! (0 < s && s <= SIZE_MAX / sizeof ex->exclude[0]))
-       xalloc_die ();
-      ex->exclude_alloc = s;
-      ex->exclude = (struct patopts *) xrealloc (ex->exclude,
-                                                s * sizeof ex->exclude[0]);
+      /* Do we really have a wildcard? */
+      char const* check = pattern;
+      bool wildcard = false;
+      
+      for (check = pattern; !wildcard && *check; ++check)
+        {
+          /* This may be quicker with a lookup table, */
+          /* but a good compiler would make a table from this */
+          switch (*check)
+            {
+            case '?':
+            case '*':
+            case '[':
+            case ']':
+              wildcard = true;
+              break;
+            default:
+              /* Do nothing */
+              break;
+            }
+        }
+      if (!wildcard)
+        {
+          /* No wildcards in the pattern, it is safe to turn off */
+          /* the wildcard option */
+          options ^= EXCLUDE_WILDCARDS;
+        }
+    }
+  
+  if ((options & EXCLUDE_WILDCARDS) ||
+      !(options & EXCLUDE_ANCHORED))
+    {
+      /* Add this pattern to the regexp list */
+      struct patopts *patopts;
+      
+      if (ex->exclude_alloc <= ex->exclude_count)
+        {
+          size_t s = 2 * ex->exclude_alloc;
+          if (! (0 < s && s <= SIZE_MAX / sizeof ex->exclude[0]))
+            xalloc_die ();
+          ex->exclude_alloc = s;
+          ex->exclude = (struct patopts *) xrealloc (ex->exclude,
+                                                     s * sizeof 
ex->exclude[0]);
+        }
+
+      patopts = &ex->exclude[ex->exclude_count++];
+      patopts->pattern = pattern;
+      patopts->options = options;
+      patopts->exclude_idx = exclude_idx;
+    }
+  else
+    {
+      /* Safe for this pattern to join a hashed list */
+      struct excludefast_opt* efo = ex->nowildcards;
+      bool done = false;
+      while (!done && efo)
+        {
+          if (efo->options == options)
+            {
+              add_excludefast( efo->ef, pattern, options, exclude_idx );
+              done = true;
+            }
+          efo = efo->next;
+        }
+      if (!done)
+        {
+          efo = ex->nowildcards;
+          ex->nowildcards = (struct excludefast_opt*)malloc(sizeof(struct 
excludefast_opt));
+          ex->nowildcards->next = efo;
+          ex->nowildcards->options = options;
+          ex->nowildcards->ef = new_excludefast( simplify_name_in_place );
+          add_excludefast( ex->nowildcards->ef, pattern, options, exclude_idx 
);
+        }
     }
-
-  patopts = &ex->exclude[ex->exclude_count++];
-  patopts->pattern = pattern;
-  patopts->options = options;
 }
 
 /* Use ADD_FUNC to append to EX the patterns in FILENAME, each with
    OPTIONS.  LINE_END terminates each pattern in the file.  Return -1
    on failure, 0 on success.  */
 
 int
-add_exclude_file (void (*add_func) PARAMS ((struct exclude *,
-                                           char const *, int)),
-                 struct exclude *ex, char const *filename, int options,
+add_exclude_file (struct exclude *ex, char const *filename, int options,
                  char line_end)
 {
   bool use_stdin = filename[0] == '-' && !filename[1];
   FILE *in;
   char *buf;
@@ -256,11 +362,11 @@
 
   for (pattern = p = buf, lim = buf + buf_count;  p <= lim;  p++)
     if (p < lim ? *p == line_end : buf < p && p[-1])
       {
        *p = '\0';
-       (*add_func) (ex, pattern, options);
+       add_exclude (ex, pattern, options);
        pattern = p + 1;
       }
 
   errno = e;
   return e ? -1 : 0;
diff --new-file -r -U 5 tar-1.13.25/lib/exclude.h tar-1.13.25pp/lib/exclude.h
--- tar-1.13.25/lib/exclude.h   Sun Aug 26 19:07:15 2001
+++ tar-1.13.25pp/lib/exclude.h Fri May  3 06:09:42 2002
@@ -42,8 +42,7 @@
 struct exclude;
 
 struct exclude *new_exclude PARAMS ((void));
 void free_exclude PARAMS ((struct exclude *));
 void add_exclude PARAMS ((struct exclude *, char const *, int));
-int add_exclude_file PARAMS ((void (*) (struct exclude *, char const *, int),
-                             struct exclude *, char const *, int, char));
+int add_exclude_file PARAMS ((struct exclude *, char const *, int, char));
 bool excluded_filename PARAMS ((struct exclude const *, char const *));
diff --new-file -r -U 5 tar-1.13.25/lib/excludefast.c 
tar-1.13.25pp/lib/excludefast.c
--- tar-1.13.25/lib/excludefast.c       Wed Dec 31 19:00:00 1969
+++ tar-1.13.25pp/lib/excludefast.c     Fri May  3 17:29:56 2002
@@ -0,0 +1,177 @@
+/* excludefast.c -- exclude file names
+   Copyright 1992, 1993, 1994, 1997 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; see the file COPYING.
+   If not, write to the Free Software Foundation,
+   59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
+
+/* Copied from code by Paul Eggert <[email protected]> */
+/* With modifications by Phil Proudman <[email protected]> */
+/*                       and Mark Oakden <[email protected]> */
+
+#if HAVE_CONFIG_H
+# include <config.h>
+#endif
+
+#if HAVE_STDBOOL_H
+# include <stdbool.h>
+#else
+typedef enum {false = 0, true = 1} bool;
+#endif
+
+#include <errno.h>
+#ifndef errno
+extern int errno;
+#endif
+#include <stdio.h>
+#if HAVE_SYS_TYPES_H
+# include <sys/types.h>
+#endif
+#if HAVE_STDLIB_H
+# include <stdlib.h>
+#endif
+#if HAVE_STRING_H
+# include <string.h>
+#endif
+#if HAVE_STRINGS_H
+# include <strings.h>
+#endif
+#if HAVE_INTTYPES_H
+# include <inttypes.h>
+#else
+# if HAVE_STDINT_H
+#  include <stdint.h>
+# endif
+#endif
+
+#include <excludefast.h>
+#include <exclude.h>
+#include <fnmatch.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include "hash.h"
+
+#ifndef SIZE_MAX
+# define SIZE_MAX ((size_t) -1)
+#endif
+
+void *xmalloc PARAMS ((size_t));
+void *xrealloc PARAMS ((void *, size_t));
+
+/* An exclude pattern-options pair.  The options are fnmatch options
+   ORed with EXCLUDE_* options.  */
+
+struct efpatopts
+  {
+    char *pattern;
+    int exclude_idx;
+    /* int options;  Not used at present */
+  };
+
+struct excludefast
+  {
+    excludefast_process process_filename;
+    Hash_table *h_table;
+  };
+
+/* Calculate the hash of a efpatopts.  */
+static unsigned
+hash_efpatopts (void const *entry, unsigned n_buckets)
+{
+  struct efpatopts const *efpatopts = entry;
+  return hash_string (efpatopts->pattern, n_buckets);
+}
+
+/* Compare two directories for equality.  */
+static bool
+compare_efpatopts (void const *entry1, void const *entry2)
+{
+  struct efpatopts const *efpatopts1 = entry1;
+  struct efpatopts const *efpatopts2 = entry2;
+  return strcmp (efpatopts1->pattern, efpatopts2->pattern) == 0;
+}
+
+static void
+free_efpatopts (void *entry1)
+{
+  struct efpatopts *efpatopts1 = entry1;
+  free(efpatopts1->pattern);
+}
+
+struct excludefast *
+new_excludefast (excludefast_process process_filename)
+{
+  struct excludefast *ex = (struct excludefast *) xmalloc (sizeof (struct 
excludefast));
+  ex->h_table = hash_initialize (0, 0, hash_efpatopts,
+                                 compare_efpatopts, free_efpatopts);
+  ex->process_filename = process_filename;
+  return ex;
+}
+
+void free_excludefast(struct excludefast *ex) {
+  hash_clear(ex->h_table);
+}
+
+int
+excludedfast_filename (struct excludefast *ex, char const *f, int options)
+{
+  char* fc;
+  char* tmp;
+  const struct efpatopts* pm;
+  int retval = 0;
+  
+  struct efpatopts fn;
+
+  /* Make a copy of f performing the same filename preprossing */
+  /* so that we can find it in the hash */
+  fc = strdup(f);
+  ex->process_filename(fc, options);
+  fn.pattern = fc;
+  
+  while (retval == 0) {
+    pm = hash_lookup (ex->h_table, &fn);
+    if (pm)
+      retval = pm->exclude_idx;
+    else {
+      /* See if parent dir is excluded */
+      for(tmp=fc; *tmp != '\0'; ++tmp);
+      while(tmp>fc && *tmp != '/') --tmp;
+      if (tmp == fc)
+        break; /* There is no parent to check */
+      else
+        *tmp = '\0';
+    }
+  }
+
+  free(fc);
+  return retval;
+}
+
+void
+add_excludefast (struct excludefast *ex, char const *pattern, int options, int 
exclude_idx)
+{
+  char* p;
+  struct efpatopts *efpatopts;
+
+  efpatopts = (struct efpatopts *)malloc(sizeof(struct efpatopts));
+  
+  p = strdup( pattern );
+  ex->process_filename( p, options );
+  
+  efpatopts->pattern = p;
+  efpatopts->exclude_idx = exclude_idx;
+
+  hash_insert (ex->h_table, efpatopts);
+}
+
diff --new-file -r -U 5 tar-1.13.25/lib/excludefast.h 
tar-1.13.25pp/lib/excludefast.h
--- tar-1.13.25/lib/excludefast.h       Wed Dec 31 19:00:00 1969
+++ tar-1.13.25pp/lib/excludefast.h     Fri May  3 15:01:53 2002
@@ -0,0 +1,39 @@
+/* excludefast.h -- declarations for excluding file names
+   Copyright 1992, 1993, 1994, 1997 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; see the file COPYING.
+   If not, write to the Free Software Foundation,
+   59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
+
+/* Copied from code by Paul Eggert <[email protected]> */
+/* With modifications by Phil Proudman <[email protected]> */
+/*                       and Mark Oakden <[email protected]> */
+
+#ifndef PARAMS
+# if defined PROTOTYPES || (defined __STDC__ && __STDC__)
+#  define PARAMS(Args) Args
+# else
+#  define PARAMS(Args) ()
+# endif
+#endif
+
+struct excludefast;
+
+/* The following will pre-process filenames before storing in the hash */
+typedef void (*excludefast_process) PARAMS ((char*, int));
+
+struct excludefast *new_excludefast PARAMS ((excludefast_process));
+void free_excludefast PARAMS ((struct excludefast *));
+void add_excludefast PARAMS ((struct excludefast *, char const *, int, int));
+int excludedfast_filename PARAMS ((struct excludefast *, char const *, int));
diff --new-file -r -U 5 tar-1.13.25/src/tar.c tar-1.13.25pp/src/tar.c
--- tar-1.13.25/src/tar.c       Thu Sep 20 20:11:27 2001
+++ tar-1.13.25pp/src/tar.c     Fri May  3 19:55:06 2002
@@ -128,15 +128,17 @@
 {
   ANCHORED_OPTION = CHAR_MAX + 1,
   BACKUP_OPTION,
   DELETE_OPTION,
   EXCLUDE_OPTION,
+  EXCLUDE_INCLUDE_OPTION,
   GROUP_OPTION,
   IGNORE_CASE_OPTION,
   MODE_OPTION,
   NEWER_MTIME_OPTION,
   NO_ANCHORED_OPTION,
+  NO_EXCLUDE_INCLUDE_OPTION,
   NO_IGNORE_CASE_OPTION,
   NO_WILDCARDS_OPTION,
   NO_WILDCARDS_MATCH_SLASH_OPTION,
   NULL_OPTION,
   OVERWRITE_OPTION,
@@ -197,10 +199,11 @@
   {"dereference", no_argument, 0, 'h'},
   {"diff", no_argument, 0, 'd'},
   {"directory", required_argument, 0, 'C'},
   {"exclude", required_argument, 0, EXCLUDE_OPTION},
   {"exclude-from", required_argument, 0, 'X'},
+  {"exclude-include", no_argument, 0, EXCLUDE_INCLUDE_OPTION},
   {"extract", no_argument, 0, 'x'},
   {"file", required_argument, 0, 'f'},
   {"files-from", required_argument, 0, 'T'},
   {"force-local", no_argument, &force_local_option, 1},
   {"get", no_argument, 0, 'x'},
@@ -225,10 +228,11 @@
   {"new-volume-script", required_argument, 0, 'F'},
   {"newer", required_argument, 0, 'N'},
   {"newer-mtime", required_argument, 0, NEWER_MTIME_OPTION},
   {"null", no_argument, 0, NULL_OPTION},
   {"no-anchored", no_argument, 0, NO_ANCHORED_OPTION},
+  {"no-exclude-include", no_argument, 0, NO_EXCLUDE_INCLUDE_OPTION},
   {"no-ignore-case", no_argument, 0, NO_IGNORE_CASE_OPTION},
   {"no-wildcards", no_argument, 0, NO_WILDCARDS_OPTION},
   {"no-wildcards-match-slash", no_argument, 0, 
NO_WILDCARDS_MATCH_SLASH_OPTION},
   {"no-recursion", no_argument, &recursion_option, 0},
   {"no-same-owner", no_argument, &same_owner_option, -1},
@@ -400,10 +404,13 @@
       --no-ignore-case         exclusion is case sensitive (default)\n\
       --wildcards              exclude patterns use wildcards (default)\n\
       --no-wildcards           exclude patterns are plain strings\n\
       --wildcards-match-slash  exclude pattern wildcards match '/' (default)\n\
       --no-wildcards-match-slash exclude pattern wildcards do not match '/'\n\
+      --exclude-include        inverts all following exclusions so that 
matching\n\
+                               files will be included\n\
+      --no-exclude-include     all following exclusions will be omitted 
(default)\n\
   -P, --absolute-names         don't strip leading `/'s from file names\n\
   -h, --dereference            dump instead the files symlinks point to\n\
       --no-recursion           avoid descending automatically in directories\n\
   -l, --one-file-system        stay in local file system when creating 
archive\n\
   -K, --starting-file=NAME     begin at file NAME in the archive\n"),
@@ -494,11 +501,11 @@
 {
   int optchar;                 /* option letter */
   int input_files;             /* number of input files */
   const char *backup_suffix_string;
   const char *version_control_string = 0;
-  int exclude_options = EXCLUDE_WILDCARDS;
+  int exclude_options = EXCLUDE_WILDCARDS | EXCLUDE_ANCHORED;
 
   /* Set some default option values.  */
 
   subcommand_option = UNKNOWN_SUBCOMMAND;
   archive_format = DEFAULT_FORMAT;
@@ -864,11 +871,11 @@
       case 'x':
        set_subcommand_option (EXTRACT_SUBCOMMAND);
        break;
 
       case 'X':
-       if (add_exclude_file (add_exclude, excluded, optarg,
+       if (add_exclude_file (excluded, optarg,
                              exclude_options | recursion_option, '\n')
            != 0)
          {
            int e = errno;
            FATAL_ERROR ((0, e, "%s", quotearg_colon (optarg)));
@@ -887,18 +894,17 @@
 
       case 'Z':
        set_use_compress_program_option ("compress");
        break;
 
-      case OBSOLETE_VERSION_CONTROL:
-       WARN ((0, 0, _("Obsolete option name replaced by --backup")));
-       /* Fall through.  */
-
       case ANCHORED_OPTION:
        exclude_options |= EXCLUDE_ANCHORED;
        break;
 
+      case OBSOLETE_VERSION_CONTROL:
+       WARN ((0, 0, _("Obsolete option name replaced by --backup")));
+       /* Fall through.  */
       case BACKUP_OPTION:
        backup_option = 1;
        if (optarg)
          version_control_string = optarg;
        break;
@@ -949,10 +955,14 @@
 
       case NO_WILDCARDS_OPTION:
        exclude_options &= ~ EXCLUDE_WILDCARDS;
        break;
 
+      case NO_EXCLUDE_INCLUDE_OPTION:
+       exclude_options &= ~ EXCLUDE_INCLUDE;
+       break;
+
       case NO_WILDCARDS_MATCH_SLASH_OPTION:
        exclude_options |= FNM_FILE_NAME;
        break;
 
       case NULL_OPTION:
@@ -1032,10 +1042,14 @@
        volno_file_option = optarg;
        break;
 
       case WILDCARDS_OPTION:
        exclude_options |= EXCLUDE_WILDCARDS;
+       break;
+
+      case EXCLUDE_INCLUDE_OPTION:
+       exclude_options |= EXCLUDE_INCLUDE;
        break;
 
       case WILDCARDS_MATCH_SLASH_OPTION:
        exclude_options &= ~ FNM_FILE_NAME;
        break;
diff --new-file -r -U 5 tar-1.13.25/tests/Makefile.in 
tar-1.13.25pp/tests/Makefile.in
--- tar-1.13.25/tests/Makefile.in       Wed Sep 26 16:35:03 2001
+++ tar-1.13.25pp/tests/Makefile.in     Fri May  3 20:03:04 2002
@@ -109,12 +109,13 @@
 
 TESTS = version.sh \
   append.sh delete01.sh delete02.sh delete03.sh \
   extrac01.sh extrac02.sh extrac03.sh extrac04.sh \
   gzip.sh incremen.sh ignfail.sh \
-  old.sh volume.sh
-
+  old.sh volume.sh  exclude01.sh exclude02.sh \
+  exclude03.sh exclude04.sh exclude05.sh \
+  exclude06.sh exclude07.sh
 
 genfile_SOURCES = genfile.c
 EXTRA_DIST = after before preset.in $(TESTS)
 
 localedir = $(datadir)/locale
diff --new-file -r -U 5 tar-1.13.25/tests/exclude01.sh 
tar-1.13.25pp/tests/exclude01.sh
--- tar-1.13.25/tests/exclude01.sh      Wed Dec 31 19:00:00 1969
+++ tar-1.13.25pp/tests/exclude01.sh    Fri May  3 20:32:16 2002
@@ -0,0 +1,33 @@
+#! /bin/sh
+
+. ./preset
+. $srcdir/before
+
+set -e
+rm -rf testdir
+mkdir -p testdir/dir1 testdir/dir2
+genfile -l 500 > testdir/dir1/file1
+genfile -l 124 > testdir/dir1/\*
+genfile -l 200 > testdir/dir2/file2
+genfile -l 200 > testdir/dir2/\*
+
+# Because we are matching wildcards when trying to exclude
+# the dir1/\* we also loose dir1/file1 (since * is wild)
+#  - this is stopped with --no-wildcards as seen before
+#    excluding dir2/\*
+
+tar cf archive --exclude=testdir/dir1/\* \
+               --no-wildcards \
+               --exclude=testdir/dir2/\* \
+               testdir
+tar tf archive
+rm -rf testdir
+
+out="\
+testdir/
+testdir/dir1/
+testdir/dir2/
+testdir/dir2/file2
+"
+
+. $srcdir/after
diff --new-file -r -U 5 tar-1.13.25/tests/exclude02.sh 
tar-1.13.25pp/tests/exclude02.sh
--- tar-1.13.25/tests/exclude02.sh      Wed Dec 31 19:00:00 1969
+++ tar-1.13.25pp/tests/exclude02.sh    Fri May  3 14:48:37 2002
@@ -0,0 +1,25 @@
+#! /bin/sh
+
+. ./preset
+. $srcdir/before
+
+set -e
+rm -rf testdir
+mkdir -p testdir/dir
+genfile -l 5000 > testdir/file1
+genfile -l 1024 > testdir/file2
+genfile -l 200 > testdir/dir/file3
+genfile -l 200 > testdir/dir/fi\*le4
+
+tar cf archive --exclude=testdir/file2 \
+               --exclude=testdir/dir \
+               testdir
+tar tf archive
+rm -rf testdir
+
+out="\
+testdir/
+testdir/file1
+"
+
+. $srcdir/after
diff --new-file -r -U 5 tar-1.13.25/tests/exclude03.sh 
tar-1.13.25pp/tests/exclude03.sh
--- tar-1.13.25/tests/exclude03.sh      Wed Dec 31 19:00:00 1969
+++ tar-1.13.25pp/tests/exclude03.sh    Fri May  3 14:47:48 2002
@@ -0,0 +1,28 @@
+#! /bin/sh
+
+. ./preset
+. $srcdir/before
+
+set -e
+rm -rf testdir
+mkdir -p testdir/dir
+genfile -l 5000 > testdir/file1
+genfile -l 1024 > testdir/file2
+genfile -l 200 > testdir/dir/file3
+genfile -l 200 > testdir/dir/fi\*le4
+
+tar cf archive --exclude=testdir/./file2 \
+               --exclude=./testdir//./dir/\* \
+               --exclude=testdir/dir/fi\*le4 \
+               testdir
+tar tf archive
+rm -rf testdir
+
+out="\
+testdir/
+testdir/dir/
+testdir/dir/file3
+testdir/file1
+"
+
+. $srcdir/after
diff --new-file -r -U 5 tar-1.13.25/tests/exclude04.sh 
tar-1.13.25pp/tests/exclude04.sh
--- tar-1.13.25/tests/exclude04.sh      Wed Dec 31 19:00:00 1969
+++ tar-1.13.25pp/tests/exclude04.sh    Fri May  3 14:51:10 2002
@@ -0,0 +1,23 @@
+#! /bin/sh
+
+. ./preset
+. $srcdir/before
+
+set -e
+rm -rf testdir
+mkdir -p testdir
+genfile -l 5000 > "./testdir/file1.txt"
+genfile -l 1024 > testdir/file2
+
+tar cf archive --no-anchored \
+               --exclude="file1.txt" \
+               testdir
+tar tf archive
+rm -rf testdir
+
+out="\
+testdir/
+testdir/file2
+"
+
+. $srcdir/after
diff --new-file -r -U 5 tar-1.13.25/tests/exclude05.sh 
tar-1.13.25pp/tests/exclude05.sh
--- tar-1.13.25/tests/exclude05.sh      Wed Dec 31 19:00:00 1969
+++ tar-1.13.25pp/tests/exclude05.sh    Fri May  3 14:51:03 2002
@@ -0,0 +1,23 @@
+#! /bin/sh
+
+. ./preset
+. $srcdir/before
+
+set -e
+rm -rf testdir
+mkdir -p testdir
+genfile -l 5000 > "./testdir/file1.txt"
+genfile -l 1024 > testdir/file2
+
+tar cf archive --exclude="file1.txt" \
+               testdir
+tar tf archive
+rm -rf testdir
+
+out="\
+testdir/
+testdir/file1.txt
+testdir/file2
+"
+
+. $srcdir/after
diff --new-file -r -U 5 tar-1.13.25/tests/exclude06.sh 
tar-1.13.25pp/tests/exclude06.sh
--- tar-1.13.25/tests/exclude06.sh      Wed Dec 31 19:00:00 1969
+++ tar-1.13.25pp/tests/exclude06.sh    Fri May  3 20:04:37 2002
@@ -0,0 +1,28 @@
+#! /bin/sh
+
+. ./preset
+. $srcdir/before
+
+set -e
+rm -rf testdir
+mkdir -p testdir/dir
+genfile -l 200 > testdir/file1
+genfile -l 200 > testdir/file2
+genfile -l 200 > testdir/dir/file3
+genfile -l 200 > testdir/dir/fi\*le4
+
+tar cf archive --ignore-case \
+               --exclude=tesTdir/File2 \
+               --exclude=testdir/DIR \
+               --no-ignore-case \
+               --exclude=testdir/File1 \
+               testdir
+tar tf archive
+rm -rf testdir
+
+out="\
+testdir/
+testdir/file1
+"
+
+. $srcdir/after
diff --new-file -r -U 5 tar-1.13.25/tests/exclude07.sh 
tar-1.13.25pp/tests/exclude07.sh
--- tar-1.13.25/tests/exclude07.sh      Wed Dec 31 19:00:00 1969
+++ tar-1.13.25pp/tests/exclude07.sh    Fri May  3 20:19:49 2002
@@ -0,0 +1,44 @@
+#! /bin/sh
+
+. ./preset
+. $srcdir/before
+
+set -e
+rm -rf testdir
+mkdir -p testdir
+genfile -l 200 > testdir/fast1
+genfile -l 200 > testdir/fast2
+genfile -l 200 > testdir/file~
+genfile -l 200 > testdir/file_bad.c~
+genfile -l 200 > testdir/file_good.c~
+
+# This tests the hash exclusion mixing with regexp, and
+# inverting the exclude_include flag
+
+# testdir/fast*
+#   I want testdir/fast1 and testdir/fast2 excluded EXCEPT
+#   all files ending in 1
+
+# testdir/file*
+#   I want all files ending ~ to be excluded UNLESS
+#   the file ends in .c~ - but with the exception of testdir/file_bad.c~
+
+tar cf archive   --exclude=*~ \
+                 --exclude=testdir/fast1 \
+                 --exclude=testdir/fast2 \
+               --exclude-include \
+                 --exclude=*.c~ \
+                 --exclude=*1 \
+               --no-exclude-include \
+                 --exclude=testdir/file_bad.c~ \
+                 testdir
+tar tf archive
+rm -rf testdir
+
+out="\
+testdir/
+testdir/fast1
+testdir/file_good.c~
+"
+
+. $srcdir/after

--- End Message ---
--- Begin Message ---
On Tue, 18 Nov 2003 11:59:58 -0500 Brian Ristuccia
<[email protected]> wrote:
> Package: tar
> Version: 1.13.25-2
> 
> Tar will consume inordinate amounts of cpu time if you use the
> --exclude-from=file with a large file or many --exclude options. The time
> spent winds up being O(potential_archive_members * total_exclusions), which
> for any meaningful number of exclusions is a very, very big number. For
> example, with an 8MB exclude list containing aprox 90,000 items, tar runs at
> a rate of about 1 file per second on a 1400mhz pIII with 512KB CPU cache!
> Even if my tape drive and disks were infinitely fast, it would still take
> aproximately 43 hours for tar to back up my system with 158205 files.
> 
> The attached patch causes tar to use a hashed data structure to store the
> exclusions, resulting in acceptable performance. See
> http://www.proudman51.freeserve.co.uk/tar.html
> 
> -- 
> Brian Ristuccia

This bug was fixed but apparently not marked as "-done" in the 1.23
upload.  Fixing that now with this mail. :)

Thanks,
~Niels

--- End Message ---

Reply via email to