On 3/22/22 21:34, DJ Delorie wrote:

Of course performance will suffer with all these correctness patches,
but that can wait until a rewrite.

Modern XFS and EXT filesystems should not hit these code paths at all,
except for symbolic links, and even then only with GLOB_ONLYDIR.

Right, though glob uses GLOB_ONLYDIR internally so this slower code will execute in some cases even when the user doesn't specify GLOB_ONLYDIR.


+# define dirfd(str) __dirfd (str)

This needs an #undef before it, else it causes build errors as glibc
already has a definition for dirfd() and it conflicts with this one.
Prudence says they should *all* be protected as such, but I only mention
the new one.

Thanks, I protected it with an ifdef instead as that's more likely perform better.

Also, I noticed that the Gnulib glob module needs to be GPL not LGPLv2+, since it now depends on GPL modules like fstatat. So I fixed this (Gnulib-specific) issue as well.


I contemplated the case of symbolic links; I couldn't find anything in
the standards about it but I went with "glob does what shell wildcards
do" and those followed links, and I think that makes sense, so OK.  I
added that case to my local test area and it seems to do what I think
people will expect it to do.

Thanks for checking that.

I installed the attached into Gnulib. The first two patches are the same as what I sent earlier, except with the ifdef and GPL changes mentioned above. The 3rd patch tests for glibc bug 25659, replaces glibc glob if the bug is present, and makes sure the resulting glob passes a sanity check.

I hope this lets us make the Gnulib and glibc glob.c identical.
From ab8b3afda2aadcfd1ab7fd9f7bed92467dfe24af Mon Sep 17 00:00:00 2001
From: DJ Delorie <d...@redhat.com>
Date: Wed, 23 Mar 2022 09:39:37 -0700
Subject: [PATCH 1/3] glob: resolve DT_UNKNOWN via is_dir

The DT_* values returned by getdents (readdir) are only hints and
not required.  In fact, some Linux filesystems return DT_UNKNOWN
for most entries, regardless of actual type.  This causes make
to mis-match patterns with a trailing slash (via GLOB_ONLYDIR)
(see make's functions/wildcard test case).  Thus, this patch
detects that case and uses is_dir() to make the type known enough
for proper operation.

Performance in non-DT_UNKNOWN cases is not affected.

The lack of DT_* is a well known issue on older XFS installations
(for example, RHEL 7 and 8, Fedora 28) but can be recreated by
creating an XFS filesystem with flags that mimic older behavior:

$ fallocate -l 10G /xfs.fs
$ mkfs.xfs -n ftype=0 -m crc=0 -f /xfs.fs
$ mkdir /xfs
$ mount -o loop /xfs.fs /xfs
---
 ChangeLog  | 23 +++++++++++++++++++++++
 lib/glob.c | 28 +++++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 5fab584974..8c50a52c78 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,26 @@
+2022-03-23  DJ Delorie  <d...@redhat.com>
+
+	glob: resolve DT_UNKNOWN via is_dir
+
+	The DT_* values returned by getdents (readdir) are only hints and
+	not required.  In fact, some Linux filesystems return DT_UNKNOWN
+	for most entries, regardless of actual type.  This causes make
+	to mis-match patterns with a trailing slash (via GLOB_ONLYDIR)
+	(see make's functions/wildcard test case).  Thus, this patch
+	detects that case and uses is_dir() to make the type known enough
+	for proper operation.
+
+	Performance in non-DT_UNKNOWN cases is not affected.
+
+	The lack of DT_* is a well known issue on older XFS installations
+	(for example, RHEL 7 and 8, Fedora 28) but can be recreated by
+	creating an XFS filesystem with flags that mimic older behavior:
+
+	$ fallocate -l 10G /xfs.fs
+	$ mkfs.xfs -n ftype=0 -m crc=0 -f /xfs.fs
+	$ mkdir /xfs
+	$ mount -o loop /xfs.fs /xfs
+
 2022-03-20  Jim Meyering  <meyer...@fb.com>
 
 	maint: bootstrap: split a too-long line
diff --git a/lib/glob.c b/lib/glob.c
index f8d8a306f2..0da46ac138 100644
--- a/lib/glob.c
+++ b/lib/glob.c
@@ -1381,7 +1381,33 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
               if (flags & GLOB_ONLYDIR)
                 switch (readdir_result_type (d))
                   {
-                  case DT_DIR: case DT_LNK: case DT_UNKNOWN: break;
+                  case DT_DIR: case DT_LNK: break;
+                  case DT_UNKNOWN:
+                    {
+                      /* The filesystem was too lazy to give us a hint,
+                         so we have to do it the hard way.  */
+                      char *fullpath, *p;
+                      bool isdir;
+                      int need = strlen (directory) + strlen (d.name) + 2;
+                      int use_alloca = glob_use_alloca (alloca_used, need);
+                      if (use_alloca)
+                        fullpath = alloca_account (need, alloca_used);
+                      else
+                        {
+                          fullpath = malloc (need);
+                          if (fullpath == NULL)
+                            goto memory_error;
+                        }
+                      p = stpcpy (fullpath, directory);
+                      *p++ = '/';
+                      strcpy (p, d.name);
+                      isdir = is_dir (fullpath, flags, pglob);
+                      if (!use_alloca)
+                        free (fullpath);
+                      if (isdir)
+                        break;
+                      continue;
+                    }
                   default: continue;
                   }
 
-- 
2.32.0

From 2f7f02986f9d338b5bb0e865bfd278678fb96325 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Wed, 23 Mar 2022 09:52:58 -0700
Subject: [PATCH 2/3] glob: fix symlink and // issues; improve speed

* lib/glob.c: Include fcntl.h.
(dirfd) [_LIBC]: New macro.
(GLOB_STAT64, GLOB_LSTAT64): Remove.  Replace all uses with ...
(GLOB_FSTATAT64): ... this new macro.
(glob_in_dir): Treat DT_LNK like DT_UNKNOWN.
Use directory-relative fstatat unless GLOB_ALTDIRFUNC, or dirfd fails.
Avoid duplicate strlen (directory).
Work even if directory is "/", without turning it into "//".
Use a scratch buffer instead of by-hand alloca stuff.
Use mempcpy and memcpy instead of stpcpy and strcpy.
* modules/glob (Depends-on): Add dirfd, fstatat.  Remove stat.
(License): Change from LGPLv2+ to GPL, since it depends on
fstatat.
---
 ChangeLog    | 17 ++++++++++++
 lib/glob.c   | 76 +++++++++++++++++++++++++++-------------------------
 modules/glob |  5 ++--
 3 files changed, 60 insertions(+), 38 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 8c50a52c78..a0d3519162 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,20 @@
+2022-03-23  Paul Eggert  <egg...@cs.ucla.edu>
+
+	glob: fix symlink and // issues; improve speed
+	* lib/glob.c: Include fcntl.h.
+	(dirfd) [_LIBC]: New macro.
+	(GLOB_STAT64, GLOB_LSTAT64): Remove.  Replace all uses with ...
+	(GLOB_FSTATAT64): ... this new macro.
+	(glob_in_dir): Treat DT_LNK like DT_UNKNOWN.
+	Use directory-relative fstatat unless GLOB_ALTDIRFUNC, or dirfd fails.
+	Avoid duplicate strlen (directory).
+	Work even if directory is "/", without turning it into "//".
+	Use a scratch buffer instead of by-hand alloca stuff.
+	Use mempcpy and memcpy instead of stpcpy and strcpy.
+	* modules/glob (Depends-on): Add dirfd, fstatat.  Remove stat.
+	(License): Change from LGPLv2+ to GPL, since it depends on
+	fstatat.
+
 2022-03-23  DJ Delorie  <d...@redhat.com>
 
 	glob: resolve DT_UNKNOWN via is_dir
diff --git a/lib/glob.c b/lib/glob.c
index 0da46ac138..52c79b4cd8 100644
--- a/lib/glob.c
+++ b/lib/glob.c
@@ -28,6 +28,7 @@
 #include <glob.h>
 
 #include <errno.h>
+#include <fcntl.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <stdbool.h>
@@ -56,6 +57,9 @@
 # define sysconf(id) __sysconf (id)
 # define closedir(dir) __closedir (dir)
 # define opendir(name) __opendir (name)
+# ifndef dirfd
+#  define dirfd(str) __dirfd (str)
+# endif
 # define readdir(str) __readdir64 (str)
 # define getpwnam_r(name, bufp, buf, len, res) \
     __getpwnam_r (name, bufp, buf, len, res)
@@ -69,11 +73,8 @@
 # ifndef GLOB_LSTAT
 #  define GLOB_LSTAT            gl_lstat
 # endif
-# ifndef GLOB_STAT64
-#  define GLOB_STAT64           __stat64
-# endif
-# ifndef GLOB_LSTAT64
-#  define GLOB_LSTAT64          __lstat64
+# ifndef GLOB_FSTATAT64
+#  define GLOB_FSTATAT64        __fstatat64
 # endif
 # include <shlib-compat.h>
 #else /* !_LIBC */
@@ -88,8 +89,7 @@
 # define struct_stat            struct stat
 # define struct_stat64          struct stat
 # define GLOB_LSTAT             gl_lstat
-# define GLOB_STAT64            stat
-# define GLOB_LSTAT64           lstat
+# define GLOB_FSTATAT64         fstatat
 #endif /* _LIBC */
 
 #include <fnmatch.h>
@@ -215,7 +215,8 @@ glob_lstat (glob_t *pglob, int flags, const char *fullname)
   } ust;
   return (__glibc_unlikely (flags & GLOB_ALTDIRFUNC)
           ? pglob->GLOB_LSTAT (fullname, &ust.st)
-          : GLOB_LSTAT64 (fullname, &ust.st64));
+          : GLOB_FSTATAT64 (AT_FDCWD, fullname, &ust.st64,
+                            AT_SYMLINK_NOFOLLOW));
 }
 
 /* Set *R = A + B.  Return true if the answer is mathematically
@@ -257,7 +258,8 @@ is_dir (char const *filename, int flags, glob_t const *pglob)
   struct_stat64 st64;
   return (__glibc_unlikely (flags & GLOB_ALTDIRFUNC)
           ? pglob->gl_stat (filename, &st) == 0 && S_ISDIR (st.st_mode)
-          : GLOB_STAT64 (filename, &st64) == 0 && S_ISDIR (st64.st_mode));
+          : (GLOB_FSTATAT64 (AT_FDCWD, filename, &st64, 0) == 0
+             && S_ISDIR (st64.st_mode)));
 }
 
 /* Find the end of the sub-pattern in a brace expression.  */
@@ -1283,6 +1285,8 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
 {
   size_t dirlen = strlen (directory);
   void *stream = NULL;
+  struct scratch_buffer s;
+  scratch_buffer_init (&s);
 # define GLOBNAMES_MEMBERS(nnames) \
     struct globnames *next; size_t count; char *name[nnames];
   struct globnames { GLOBNAMES_MEMBERS (FLEXIBLE_ARRAY_MEMBER) };
@@ -1354,6 +1358,7 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
         }
       else
         {
+          int dfd = dirfd (stream);
           int fnm_flags = ((!(flags & GLOB_PERIOD) ? FNM_PERIOD : 0)
                            | ((flags & GLOB_NOESCAPE) ? FNM_NOESCAPE : 0));
           flags |= GLOB_MAGCHAR;
@@ -1381,34 +1386,32 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
               if (flags & GLOB_ONLYDIR)
                 switch (readdir_result_type (d))
                   {
-                  case DT_DIR: case DT_LNK: break;
-                  case DT_UNKNOWN:
-                    {
-                      /* The filesystem was too lazy to give us a hint,
-                         so we have to do it the hard way.  */
-                      char *fullpath, *p;
-                      bool isdir;
-                      int need = strlen (directory) + strlen (d.name) + 2;
-                      int use_alloca = glob_use_alloca (alloca_used, need);
-                      if (use_alloca)
-                        fullpath = alloca_account (need, alloca_used);
-                      else
-                        {
-                          fullpath = malloc (need);
-                          if (fullpath == NULL)
-                            goto memory_error;
-                        }
-                      p = stpcpy (fullpath, directory);
-                      *p++ = '/';
-                      strcpy (p, d.name);
-                      isdir = is_dir (fullpath, flags, pglob);
-                      if (!use_alloca)
-                        free (fullpath);
-                      if (isdir)
-                        break;
-                      continue;
-                    }
                   default: continue;
+                  case DT_DIR: break;
+                  case DT_LNK: case DT_UNKNOWN:
+                    /* The filesystem was too lazy to give us a hint,
+                       so we have to do it the hard way.  */
+                    if (__glibc_unlikely (dfd < 0 || flags & GLOB_ALTDIRFUNC))
+                      {
+                        size_t namelen = strlen (d.name);
+                        size_t need = dirlen + 1 + namelen + 1;
+                        if (s.length < need
+                            && !scratch_buffer_set_array_size (&s, need, 1))
+                          goto memory_error;
+                        char *p = mempcpy (s.data, directory, dirlen);
+                        *p = '/';
+                        p += p[-1] != '/';
+                        memcpy (p, d.name, namelen + 1);
+                        if (! is_dir (s.data, flags, pglob))
+                          continue;
+                      }
+                    else
+                      {
+                        struct_stat64 st64;
+                        if (! (GLOB_FSTATAT64 (dfd, d.name, &st64, 0) == 0
+                               && S_ISDIR (st64.st_mode)))
+                          continue;
+                      }
                   }
 
               if (fnmatch (pattern, d.name, fnm_flags) == 0)
@@ -1540,5 +1543,6 @@ glob_in_dir (const char *pattern, const char *directory, int flags,
       __set_errno (save);
     }
 
+  scratch_buffer_free (&s);
   return result;
 }
diff --git a/modules/glob b/modules/glob
index ba270e8427..83cd729ceb 100644
--- a/modules/glob
+++ b/modules/glob
@@ -17,8 +17,10 @@ alloca          [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 builtin-expect  [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 closedir        [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 d-type          [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
+dirfd           [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 flexmember      [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 fnmatch         [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
+fstatat         [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 getlogin_r      [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 libc-config     [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 memchr          [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
@@ -26,7 +28,6 @@ mempcpy         [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 opendir         [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 readdir         [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 scratch_buffer  [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
-stat            [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 stdbool         [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 stdint          [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
 strdup          [test $HAVE_GLOB = 0 || test $REPLACE_GLOB = 1]
@@ -61,7 +62,7 @@ Link:
 $(LIB_MBRTOWC)
 
 License:
-LGPLv2+
+GPL
 
 Maintainer:
 all, glibc
-- 
2.32.0

From e265ddd557aa6f8d96919f87680c8525ae8fa84c Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Wed, 23 Mar 2022 10:22:51 -0700
Subject: [PATCH 3/3] glob: test for glibc bug 25659

https://sourceware.org/bugzilla/show_bug.cgi?id=25659
* m4/glob.m4 (gl_GLOB): Replace glob if it has bug 25659.
* tests/test-glob.c (main): Test for glibc bug 25659.
---
 ChangeLog         |  5 +++++
 m4/glob.m4        | 41 ++++++++++++++++++++++++++++++++++++++++-
 tests/test-glob.c |  9 ++++++++-
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index a0d3519162..7ea4f7797b 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,10 @@
 2022-03-23  Paul Eggert  <egg...@cs.ucla.edu>
 
+	glob: test for glibc bug 25659
+	https://sourceware.org/bugzilla/show_bug.cgi?id=25659
+	* m4/glob.m4 (gl_GLOB): Replace glob if it has bug 25659.
+	* tests/test-glob.c (main): Test for glibc bug 25659.
+
 	glob: fix symlink and // issues; improve speed
 	* lib/glob.c: Include fcntl.h.
 	(dirfd) [_LIBC]: New macro.
diff --git a/m4/glob.m4 b/m4/glob.m4
index 0d1426389b..cf5f93930c 100644
--- a/m4/glob.m4
+++ b/m4/glob.m4
@@ -1,4 +1,4 @@
-# glob.m4 serial 24
+# glob.m4 serial 25
 dnl Copyright (C) 2005-2007, 2009-2022 Free Software Foundation, Inc.
 dnl This file is free software; the Free Software Foundation
 dnl gives unlimited permission to copy and/or distribute it,
@@ -66,6 +66,45 @@ char a[_GNU_GLOB_INTERFACE_VERSION == 1 || _GNU_GLOB_INTERFACE_VERSION == 2 ? 1
       esac
     fi
 
+    if test $REPLACE_GLOB = 0; then
+      AC_CACHE_CHECK([whether glob NOTDIR*/ omits symlink to nondir],
+                     [gl_cv_glob_omit_nondir_symlinks],
+        [if test $cross_compiling != yes; then
+           if ln -s conf-file conf$$-globtest 2>/dev/null && touch conf-file
+           then
+             gl_cv_glob_omit_nondir_symlinks=maybe
+           else
+             # If we can't make a symlink, then we cannot test this issue.  Be
+             # pessimistic about this.
+             gl_cv_glob_omit_nondir_symlinks=no
+           fi
+           if test $gl_cv_glob_omit_nondir_symlinks = maybe; then
+             AC_RUN_IFELSE(
+               [AC_LANG_PROGRAM(
+                  [[#include <stddef.h>
+                    #include <glob.h>]],
+                  [[glob_t found;
+                    if (glob ("conf*-globtest/", 0, NULL, &found) != GLOB_NOMATCH)
+                      return 1;
+                    globfree (&found);
+                  ]])],
+               [gl_cv_glob_omit_nondir_symlinks=yes],
+               [gl_cv_glob_omit_nondir_symlinks=no],
+               [dnl We don't get here.
+                :
+               ])
+           fi
+           rm -f conf$$-globtest
+         else
+           gl_cv_glob_omit_nondir_symlinks="$gl_cross_guess_normal"
+         fi
+        ])
+      case "$gl_cv_glob_omit_nondir_symlinks" in
+        *yes) ;;
+        *) REPLACE_GLOB=1 ;;
+      esac
+    fi
+
   fi
 
   if test $ac_cv_func_glob_pattern_p = no; then
diff --git a/tests/test-glob.c b/tests/test-glob.c
index 2709080bc7..568acf14b7 100644
--- a/tests/test-glob.c
+++ b/tests/test-glob.c
@@ -72,7 +72,9 @@ main ()
   globfree (&g);
 
   if ((symlink (GL_NO_SUCH_FILE, BASE "globlink1") == 0 || errno == EEXIST)
-      && (symlink (".", BASE "globlink2") == 0 || errno == EEXIST))
+      && (symlink (".", BASE "globlink2") == 0 || errno == EEXIST)
+      && (symlink (BASE "globfile", BASE "globlink3") == 0 || errno == EEXIST)
+      && close (creat (BASE "globfile", 0666)) == 0)
     {
       res = glob (BASE "globlink[12]", 0, NULL, &g);
       ASSERT (res == 0 && g.gl_pathc == 2);
@@ -80,6 +82,11 @@ main ()
       ASSERT (strcmp (g.gl_pathv[1], BASE "globlink2") == 0);
       globfree (&g);
 
+      res = glob (BASE "globlink[123]/", 0, NULL, &g);
+      ASSERT (res == 0 && g.gl_pathc == 1);
+      ASSERT (strcmp (g.gl_pathv[0], BASE "globlink2/") == 0);
+      globfree (&g);
+
       res = glob (BASE "globlink[12]", GLOB_MARK, NULL, &g);
       ASSERT (res == 0 && g.gl_pathc == 2);
       ASSERT (strcmp (g.gl_pathv[0], BASE "globlink1") == 0);
-- 
2.32.0

Reply via email to