Thanks for looking into this; it's long been on my plate but I haven't had time to work on the proper solution, which is basically to rewrite glob from scratch (this should make it considerably faster).

As far as your patch goes:

Gnulib prefers spaces to tabs.

The code unnecessarily calls strlen (directory).

The patch mishandles a directory "/" and a file "x", as it stats "//x" but this may differ from "/x" on some systems.

For speed the code should prefer fstatat on d.name to stat on the full name. Too bad glob_t doesn't have a gl_fstatat entry, but we can use fstatat when GLOB_ALTDIRFUNC is not in use.

The patch goes through a loop calling alloca_account, which is not good: it'll waste the stack. Better to use a scratch buffer, as in other parts of glob.c.

The code should prefer mempcpy and memcpy to stpcpy and strcpy (as in the rest of glob.c). That way, the Gnulib glob module needn't depend on the stpcpy module (plus the code's more reliable if another thread stomps on our strings between strlen and strcpy :-).

I don't see how the patch fixes the case where readdir_result_type (d) returns DT_LNK; this might be a symlink to a directory.

Proposed pair of patches attached (I haven't installed these). The first is yours but with tabs turned to spaces and with ChangeLog equal to your log entry. The second contains my proposed improvements. I haven't tested except on the Gnulib test cases (which aren't much).

Of course performance will suffer with all these correctness patches, but that can wait until a rewrite.
From 69247bc996c50da0564f6157358ba99b13abbd16 Mon Sep 17 00:00:00 2001
From: DJ Delorie <d...@redhat.com>
Date: Fri, 11 Mar 2022 16:43:39 -0500
Subject: [PATCH 1/2] glob: resolve DT_UNKNOWN via is_dir

[v2: changed malloc failure from ignore to error; added support for
alloca; tested by copying to glibc and testing there]

The DT_* values returned by getdents (readdir) are only hints and
not required.  In fact, some Linux filesystems return DT_UNKNOWN
for most entries, regardless of actual type.  This causes make
to mis-match patterns with a trailing slash (via GLOB_ONLYDIR)
(see make's functions/wildcard test case).  Thus, this patch
detects that case and uses is_dir() to make the type known enough
for proper operation.

Performance in non-DT_UNKNOWN cases is not affected.

The lack of DT_* is a well known issue on older XFS installations
(for example, RHEL 7 and 8, Fedora 28) but can be recreated by
creating an XFS filesystem with flags that mimic older behavior:

$ fallocate -l 10G /xfs.fs
$ mkfs.xfs -n ftype=0 -m crc=0 -f /xfs.fs
$ mkdir /xfs
$ mount -o loop /xfs.fs /xfs
---
 ChangeLog  | 23 +++++++++++++++++++++++
 lib/glob.c | 28 +++++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 7a6ade78c3..f39c749ad3 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,26 @@
+2022-03-11  DJ Delorie  <d...@redhat.com>
+
+	glob: resolve DT_UNKNOWN via is_dir
+
+	The DT_* values returned by getdents (readdir) are only hints and
+	not required.  In fact, some Linux filesystems return DT_UNKNOWN
+	for most entries, regardless of actual type.  This causes make
+	to mis-match patterns with a trailing slash (via GLOB_ONLYDIR)
+	(see make's functions/wildcard test case).  Thus, this patch
+	detects that case and uses is_dir() to make the type known enough
+	for proper operation.
+
+	Performance in non-DT_UNKNOWN cases is not affected.
+
+	The lack of DT_* is a well known issue on older XFS installations
+	(for example, RHEL 7 and 8, Fedora 28) but can be recreated by
+	creating an XFS filesystem with flags that mimic older behavior:
+
+	$ fallocate -l 10G /xfs.fs
+	$ mkfs.xfs -n ftype=0 -m crc=0 -f /xfs.fs
+	$ mkdir /xfs
+	$ mount -o loop /xfs.fs /xfs
+
 2022-03-11  Paul Eggert  <egg...@cs.ucla.edu>
 
 	regex: fix minor over-allocation
diff --git a/lib/glob.c b/lib/glob.c
index f8d8a306f2..0da46ac138 100644
--- a/lib/glob.c
+++ b/lib/glob.c
@@ -1381,7 +1381,33 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
               if (flags & GLOB_ONLYDIR)
                 switch (readdir_result_type (d))
                   {
-                  case DT_DIR: case DT_LNK: case DT_UNKNOWN: break;
+                  case DT_DIR: case DT_LNK: break;
+                  case DT_UNKNOWN:
+                    {
+                      /* The filesystem was too lazy to give us a hint,
+                         so we have to do it the hard way.  */
+                      char *fullpath, *p;
+                      bool isdir;
+                      int need = strlen (directory) + strlen (d.name) + 2;
+                      int use_alloca = glob_use_alloca (alloca_used, need);
+                      if (use_alloca)
+                        fullpath = alloca_account (need, alloca_used);
+                      else
+                        {
+                          fullpath = malloc (need);
+                          if (fullpath == NULL)
+                            goto memory_error;
+                        }
+                      p = stpcpy (fullpath, directory);
+                      *p++ = '/';
+                      strcpy (p, d.name);
+                      isdir = is_dir (fullpath, flags, pglob);
+                      if (!use_alloca)
+                        free (fullpath);
+                      if (isdir)
+                        break;
+                      continue;
+                    }
                   default: continue;
                   }
 
-- 
2.35.1

From 1682446c7bd0bf41c61816258096c6f6515d27f6 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Fri, 11 Mar 2022 16:35:59 -0800
Subject: [PATCH 2/2] glob: fix symlink and // issues; improve speed

* lib/glob.c: Include fcntl.h.
(dirfd) [_LIBC]: New macro.
(GLOB_STAT64, GLOB_LSTAT64): Remove.  Replace all uses with ...
(GLOB_FSTATAT64): ... this new macro.
(glob_in_dir): Treat DT_LNK like DT_UNKNOWN.
Use directory-relative fstatat unless GLOB_ALTDIRFUNC, or dirfd fails.
Avoid duplicate strlen (directory).
Work even if directory is "/", without turning it into "//".
Use a scratch buffer instead of by-hand alloca stuff.
Use mempcpy and memcpy instead of stpcpy and strcpy.
* modules/glob (Depends-on): Add dirfd, fstatat; remove stat.
---
 ChangeLog    | 15 +++++++++++
 lib/glob.c   | 74 +++++++++++++++++++++++++++-------------------------
 modules/glob |  3 ++-
 3 files changed, 55 insertions(+), 37 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index f39c749ad3..88bb4db70a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,18 @@
+2022-03-11  Paul Eggert  <egg...@cs.ucla.edu>
+
+	glob: fix symlink and // issues; improve speed
+	* lib/glob.c: Include fcntl.h.
+	(dirfd) [_LIBC]: New macro.
+	(GLOB_STAT64, GLOB_LSTAT64): Remove.  Replace all uses with ...
+	(GLOB_FSTATAT64): ... this new macro.
+	(glob_in_dir): Treat DT_LNK like DT_UNKNOWN.
+	Use directory-relative fstatat unless GLOB_ALTDIRFUNC, or dirfd fails.
+	Avoid duplicate strlen (directory).
+	Work even if directory is "/", without turning it into "//".
+	Use a scratch buffer instead of by-hand alloca stuff.
+	Use mempcpy and memcpy instead of stpcpy and strcpy.
+	* modules/glob (Depends-on): Add dirfd, fstatat; remove stat.
+
 2022-03-11  DJ Delorie  <d...@redhat.com>
 
 	glob: resolve DT_UNKNOWN via is_dir
diff --git a/lib/glob.c b/lib/glob.c
index 0da46ac138..5afb178153 100644
--- a/lib/glob.c
+++ b/lib/glob.c
@@ -28,6 +28,7 @@
 #include <glob.h>
 
 #include <errno.h>
+#include <fcntl.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <stdbool.h>
@@ -56,6 +57,7 @@
 # define sysconf(id) __sysconf (id)
 # define closedir(dir) __closedir (dir)
 # define opendir(name) __opendir (name)
+# define dirfd(str) __dirfd (str)
 # define readdir(str) __readdir64 (str)
 # define getpwnam_r(name, bufp, buf, len, res) \
     __getpwnam_r (name, bufp, buf, len, res)
@@ -69,11 +71,8 @@
 # ifndef GLOB_LSTAT
 #  define GLOB_LSTAT            gl_lstat
 # endif
-# ifndef GLOB_STAT64
-#  define GLOB_STAT64           __stat64
-# endif
-# ifndef GLOB_LSTAT64
-#  define GLOB_LSTAT64          __lstat64
+# ifndef GLOB_FSTATAT64
+#  define GLOB_FSTATAT64        __fstatat64
 # endif
 # include <shlib-compat.h>
 #else /* !_LIBC */
@@ -88,8 +87,7 @@
 # define struct_stat            struct stat
 # define struct_stat64          struct stat
 # define GLOB_LSTAT             gl_lstat
-# define GLOB_STAT64            stat
-# define GLOB_LSTAT64           lstat
+# define GLOB_FSTATAT64         fstatat
 #endif /* _LIBC */
 
 #include <fnmatch.h>
@@ -215,7 +213,8 @@ glob_lstat (glob_t *pglob, int flags, const char *fullname)
   } ust;
   return (__glibc_unlikely (flags & GLOB_ALTDIRFUNC)
           ? pglob->GLOB_LSTAT (fullname, &ust.st)
-          : GLOB_LSTAT64 (fullname, &ust.st64));
+          : GLOB_FSTATAT64 (AT_FDCWD, fullname, &ust.st64,
+                            AT_SYMLINK_NOFOLLOW));
 }
 
 /* Set *R = A + B.  Return true if the answer is mathematically
@@ -257,7 +256,8 @@ is_dir (char const *filename, int flags, glob_t const *pglob)
   struct_stat64 st64;
   return (__glibc_unlikely (flags & GLOB_ALTDIRFUNC)
           ? pglob->gl_stat (filename, &st) == 0 && S_ISDIR (st.st_mode)
-          : GLOB_STAT64 (filename, &st64) == 0 && S_ISDIR (st64.st_mode));
+          : (GLOB_FSTATAT64 (AT_FDCWD, filename, &st64, 0) == 0
+             && S_ISDIR (st64.st_mode)));
 }
 
 /* Find the end of the sub-pattern in a brace expression.  */
@@ -1283,6 +1283,8 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
 {
   size_t dirlen = strlen (directory);
   void *stream = NULL;
+  struct scratch_buffer s;
+  scratch_buffer_init (&s);
 # define GLOBNAMES_MEMBERS(nnames) \
     struct globnames *next; size_t count; char *name[nnames];
   struct globnames { GLOBNAMES_MEMBERS (FLEXIBLE_ARRAY_MEMBER) };
@@ -1354,6 +1356,7 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
         }
       else
         {
+          int dfd = dirfd (stream);
           int fnm_flags = ((!(flags & GLOB_PERIOD) ? FNM_PERIOD : 0)
                            | ((flags & GLOB_NOESCAPE) ? FNM_NOESCAPE : 0));
           flags |= GLOB_MAGCHAR;
@@ -1381,34 +1384,32 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
               if (flags & GLOB_ONLYDIR)
                 switch (readdir_result_type (d))
                   {
-                  case DT_DIR: case DT_LNK: break;
-                  case DT_UNKNOWN:
-                    {
-                      /* The filesystem was too lazy to give us a hint,
-                         so we have to do it the hard way.  */
-                      char *fullpath, *p;
-                      bool isdir;
-                      int need = strlen (directory) + strlen (d.name) + 2;
-                      int use_alloca = glob_use_alloca (alloca_used, need);
-                      if (use_alloca)
-                        fullpath = alloca_account (need, alloca_used);
-                      else
-                        {
-                          fullpath = malloc (need);
-                          if (fullpath == NULL)
-                            goto memory_error;
-                        }
-                      p = stpcpy (fullpath, directory);
-                      *p++ = '/';
-                      strcpy (p, d.name);
-                      isdir = is_dir (fullpath, flags, pglob);
-                      if (!use_alloca)
-                        free (fullpath);
-                      if (isdir)
-                        break;
-                      continue;
-                    }
                   default: continue;
+                  case DT_DIR: break;
+                  case DT_LNK: case DT_UNKNOWN:
+                    /* The filesystem was too lazy to give us a hint,
+                       so we have to do it the hard way.  */
+                    if (__glibc_unlikely (dfd < 0 || flags & GLOB_ALTDIRFUNC))
+                      {
+                        size_t namelen = strlen (d.name);
+                        size_t need = dirlen + 1 + namelen + 1;
+                        if (s.length < need
+                            && !scratch_buffer_set_array_size (&s, need, 1))
+                          goto memory_error;
+                        char *p = mempcpy (s.data, directory, dirlen);
+                        *p = '/';
+                        p += p[-1] != '/';
+                        memcpy (p, d.name, namelen + 1);
+                        if (! is_dir (s.data, flags, pglob))
+                          continue;
+                      }
+                    else
+                      {
+                        struct_stat64 st64;
+                        if (! (GLOB_FSTATAT64 (dfd, d.name, &st64, 0) == 0
+                               && S_ISDIR (st64.st_mode)))
+                          continue;
+                      }
                   }
 
               if (fnmatch (pattern, d.name, fnm_flags) == 0)
@@ -1540,5 +1541,6 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
       __set_errno (save);
     }
 
+  scratch_buffer_free (&s);
   return result;
 }
diff --git a/modules/glob b/modules/glob
index ba270e8427..a43b71cfbb 100644
--- a/modules/glob
+++ b/modules/glob
@@ -17,8 +17,10 @@ alloca          [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 builtin-expect  [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 closedir        [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 d-type          [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
+dirfd           [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 flexmember      [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 fnmatch         [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
+fstatat         [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 getlogin_r      [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 libc-config     [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 memchr          [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
@@ -26,7 +28,6 @@ mempcpy         [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 opendir         [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 readdir         [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 scratch_buffer  [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
-stat            [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 stdbool         [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 stdint          [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 strdup          [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
-- 
2.35.1

Reply via email to