On 2025-02-17 Pali Rohár wrote:
> On Monday 17 February 2025 17:37:17 Lasse Collin wrote:
> > I suppose lstat() shouldn't follow any reparse points.  
> 
> That is a question for which I do not have answer.

One would need to figure out what is the least surprising behavior.

> Why should follow:
> - ensure that lstat() and stat() returns same data for file type which
>   is not symbolic link (this is somehow what POSIX applications may
>   except)
> - ensure that lstat() for mount points returns inode where mount point
>   points and not the inode of the underlying mount point itself (this
>   is again POSIX behavior)

These are very good reasons.

> Why should NOT follow:
> - some reparse point, specially those of name surrogate type,
>   represents different file (possible remote) and lstat could return
>   the real local information
> - reparse points for which driver is not installed cannot be opened in
>   "follow mode"

If lstat() did follow and a driver is not installed, lstat() could fail
with a custom errno value, that is, something else than ENOENT. Perhaps
ENXIO could be OK. Maybe stat() could do the same.

> > Maybe there could
> > be non-POSIX macros like S_IFRPP and S_ISRPP(m) to indicate any
> > other reparse point than symlink or socket. Some OSes have extra
> > macros but I don't know if the idea makes sense here.  
> 
> Unless there is a real needs from applications, I would rather not add
> any new non-POSIX/non-GNU macros.

Right, better to not add anything special until there's a clear need
for it.

If something new was introduced, one variation is to make lstat()
follow reparse points except symlinks and sockets, and add _rppstat()
that returns S_IFRPP for other reparse points. The idea would be that
POSIXish apps that care about reparse points could be adapted with
smaller changes. I definitely am not suggesting to do this, I'm just
thinking out aloud. :-)

> > If UCRT added S_IFLNK and S_IFSOCK but used different constants than
> > mingw-w64, that would be a mess. I'm not proposing any S_ macro
> > additions in this email, I'm just thinking out aloud.  
> 
> IIRC UCRT has not added support yet. But sure it is not a good idea to
> add something incompatible.

Microsoft already uses the same values for S_IFxxx as everyone else. It
would be a big surprise if they did differently, for example, with
S_IFSOCK.

If they added S_IFBLK, there's a small chance that they could look at
mingw-w64, and then adopt the weird 0x3000. I wonder how much breakage
it would cause if S_IFBLK was changed in mingw-w64, and if that risk is
worth it to get rid of this small portability trap. I *guess* that both
negative and postive effects cannot affect many packages.

I quickly tried to find examples, and didn't find many. The tcl package
has ucrt64/include/tcl8.6/tcl-private/win/tclWinPort.h which defines
S_IFBLK to 0x6000 if S_IFBLK isn't already defined. It was added in
2016. The comment refers to "Linux and MingW" but the author might have
missed the odd value. Thus MSVC and mingw-w64 built tcl use different
S_IFBLK value.

    https://packages.msys2.org/packages/mingw-w64-ucrt-x86_64-tcl

    https://github.com/tcltk/tcl/commit/48e3873f199817b52d54a4d802966b508917d200

Python's stat.S_IFBLK and stat.S_ISBLK(mode) use the values from
<sys/stat.h>. Thus, MSYS2-Python uses 0x6000 and UCRT64-Python
uses 0x3000. I suppose MSVC-Python uses 0x6000 because it's the
fallback value when S_IFBLK isn't already defined. The comment about
the constants suggests that no one noticed that mingw-w64 has an
unusual value:

    
https://github.com/python/cpython/blob/b93b7e566e5a4efe7f077af2083140e50bd2b08f/Modules/_stat.c#L55

0x6000 is used in stat.py too, but due to the import at the bottom,
the values from _stat.c are used instead.

    
https://github.com/python/cpython/blob/b93b7e566e5a4efe7f077af2083140e50bd2b08f/Lib/stat.py#L33

You can try the following Python code:

    import os
    import stat

    mode = os.stat(".").st_mode
    print("0x%x" % mode)
    print(stat.S_ISDIR(mode))
    print(stat.S_ISBLK(mode))

    mode = stat.S_IFBLK
    print("0x%x" % mode)
    print(stat.S_ISBLK(mode))

I'm not brave enough to actually propose that mingw-w64 should change
S_IFBLK to 0x6000, but I still wonder if keeping the current value does
more bad than good.

> I would propose: Do not add DT_BLK for now. And once we figure out
> that it is really useful, we can add it. So we do not need to think
> which value DT_BLK should have (unless there is no doubt that it
> should have numeric value XXXX).

I agree. It shouldn't be a disaster if S_IFBLK and DT_BLK values aren't
in sync, but since DT_BLK isn't actually required, it's better to not
define it for now.

> > Are you certain that IO_REPARSE_TAG_AF_UNIX is the right tag for
> > Win32 AF_UNIX?  
> 
> Yes, I'm sure.

Thanks! :-)

> >   - If adding DT_SOCK, is this OK:
> > 
> >     static unsigned char
> >     get_d_type (DWORD attrs, DWORD reparse_tag)
> >     {
> >       if (attrs & FILE_ATTRIBUTE_REPARSE_POINT)
> >         {
> >           switch (reparse_tag)
> >             {
> >               case IO_REPARSE_TAG_SYMLINK:
> >                 return DT_LNK;
> > 
> >               case IO_REPARSE_TAG_AF_UNIX:
> >                 return DT_SOCK;
> > 
> >               default:
> >                 return DT_UNKNOWN;
> >             }
> >         }
> > 
> >       return (attrs & FILE_ATTRIBUTE_DIRECTORY) ? DT_DIR : DT_REG;
> >     }  
> 
> For me this looks good.

Thanks! I attached another fixup patch. I shortened seekdir doc in
dirent.h a bit too. It doesn't need to be quite so specific about the
implementation details.

> > I think NAME_MAX should be increased at least if modern MSVC doesn't
> > define it. Making NAME_MAX visible with _POSIX_C_SOURCE etc. should
> > be simple, one just needs to be careful that all relevant macros
> > are listed correctly.  
> 
> I do not know about this one point.

I have changed my mind. Let's not touch NAME_MAX. It might even be good
to keep it undefined as it is now (I suppose very few apps define the
non-standard _POSIX_ macro).

(1) Apps can misuse NAME_MAX. For example, in an earlier email I linked
    to binutils bug about NAME_MAX and mingw-w64. Seems that binutils
    uses NAME_MAX in exactly one place, and there it assumes that any
    filename up to NAME_MAX ASCII chars can be *created*. If other apps
    have similar assumptions, increasing NAME_MAX can create problems.

    
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=binutils/dlltool.c;h=ef21423b06d6b30956dd44adee9df8325bb48aa0;hb=HEAD#l4002

(2) Even if NAME_MAX was defined on a system, there might not be any
    specific limit in reality. It's one reason why readdir_r() is
    deprecated. For example, NAME_MAX is 255 on Linux, but on FAT32
    mounted with iocharset=utf8, one can have 765 bytes long filenames
    because the limit is 255 UTF-16 code units. Most apps don't care
    about NAME_MAX, thus such names work normally.

    Linux actually claims 2 * 765 for NAME_MAX on FAT32:

        $ getconf NAME_MAX /mnt
        1530

    I don't think it's possible on FAT32. It's just a big enough value
    calculated as FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE.

(3) In POSIX.1-2024, the second paragraph under "Path Variable Names"
    says that macros in <limits.h> shall be omitted in some cases. That
    doesn't really apply to the case in mingw-w64, and there's no
    pathconf() in mingw-w64, but the point is that apps shouldn't
    assume that NAME_MAX exists.

    
https://pubs.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html#tag_13_24_03_02

> Some more information. Visual C++ CRT file stdio.h defines following
> constant:
> 
> #define FILENAME_MAX 260
> 
> And it defines it since the first Visual C++ 1.0 version for 32-bit NT
> up to the last version with UCRT support.
> 
> So I guess that value 260 comes from this FILENAME_MAX macro.
> 
> But what is point of FILENAME_MAX I have no idea, it is not referenced
> or used in CRT header files.

FILENAME_MAX is at least in C99 and newer standards. Its definition
is kind of relaxed. See C99 7.19.1 or C11/C17/C23 7.21.1. Defining
FILENAME_MAX to the same value as MAX_PATH sounds sensible because
MAX_PATH or _MAX_PATH is a limit in a few CRT APIs.

Size of d_name is derived from WIN32_FIND_DATAW.cFileName[MAX_PATH] in
the new dirent code (_finddata_t.filename[_MAX_PATH] in the old code).
I suppose it's more logical to refer to MAX_PATH in the dirent.h
comments.

-- 
Lasse Collin
From 0bc73b9ae3dc02efa733ec45230315dd1a0c06a2 Mon Sep 17 00:00:00 2001
From: Lasse Collin <lasse.col...@tukaani.org>
Date: Tue, 18 Feb 2025 17:38:11 +0200
Subject: [PATCH 9/9] ... Reparse point tweaks: DT_LNK, DT_SOCK, DT_UNKNOWN

---
 mingw-w64-crt/misc/dirent.c    | 34 +++++++++-------
 mingw-w64-headers/crt/dirent.h | 71 ++++++++++++++--------------------
 2 files changed, 49 insertions(+), 56 deletions(-)

diff --git a/mingw-w64-crt/misc/dirent.c b/mingw-w64-crt/misc/dirent.c
index f11e2db83..90b5d2b08 100644
--- a/mingw-w64-crt/misc/dirent.c
+++ b/mingw-w64-crt/misc/dirent.c
@@ -382,21 +382,25 @@ opendir (const char *path)
 static unsigned char
 get_d_type (DWORD attrs, DWORD reparse_tag)
 {
-    /* Check for a reparse point before checking if it is a directory.
-     * A reparse point can also have the directory attribute. This
-     * includes "directory symlinks" too. */
-    if ((attrs & FILE_ATTRIBUTE_REPARSE_POINT) &&
-       reparse_tag == IO_REPARSE_TAG_SYMLINK)
-      {
-       return DT_LNK;
-      }
-
-    if (attrs & FILE_ATTRIBUTE_DIRECTORY)
-      {
-       return DT_DIR;
-      }
-
-    return DT_REG;
+  /* Check for a reparse point before checking if it is a directory.
+   * A reparse point can also have the directory attribute. This
+   * includes "directory symlinks" too. */
+  if (attrs & FILE_ATTRIBUTE_REPARSE_POINT)
+    {
+      switch (reparse_tag)
+       {
+         case IO_REPARSE_TAG_SYMLINK:
+           return DT_LNK;
+
+         case IO_REPARSE_TAG_AF_UNIX:
+           return DT_SOCK;
+
+         default:
+           return DT_UNKNOWN;
+       }
+    }
+
+  return (attrs & FILE_ATTRIBUTE_DIRECTORY) ? DT_DIR : DT_REG;
 }
 
 
diff --git a/mingw-w64-headers/crt/dirent.h b/mingw-w64-headers/crt/dirent.h
index 06d95650a..7a27f725c 100644
--- a/mingw-w64-headers/crt/dirent.h
+++ b/mingw-w64-headers/crt/dirent.h
@@ -22,36 +22,32 @@ extern "C" {
 #endif
 
 
-/* In addition to the d_ino and d_name members required by POSIX,
+/*
+ * In addition to the d_ino and d_name members required by POSIX,
  * also d_type, d_8dot3, d_reclen, and d_namlen are present in
- * struct dirent and struct _wdirent. */
+ * struct dirent and struct _wdirent.
+ */
 #define _DIRENT_HAVE_D_TYPE    1
 #define _DIRENT_HAVE_D_8DOT3   1
 #define _DIRENT_HAVE_D_RECLEN  1
 #define _DIRENT_HAVE_D_NAMLEN  1
 
-/* Values for d_type:
- *
- *   - DT_DIR and DT_REG are used when the entry is a directory or
- *     a regular file.
- *
- *   - DT_LNK is only used for reparse points that have the tag
- *     IO_REPARSE_TAG_SYMLINK. Other reparse points are DT_DIR or DT_REG.
- *
- *   - Other values aren't possible in a directory listing. They are only
- *     defined for compatibility.
- *
- * The constants match glibc and *BSDs:
- *
- *   - DT_FIFO == (S_IFIFO >> 12)
- *     DT_CHR == (S_IFCHR >> 12)
- *     DT_DIR == (S_IFDIR >> 12)
- *     DT_REG == (S_IFREG >> 12)
- *
- *   - There is no S_IFLNK on Windows. However, on other systems,
- *     S_IFLNK == (S_IFREG | S_IFCHR), and S_IFCHR has the same
- *     value in glibc, *BSDs, and Windows. Thus the same DT_LNK
- *     constant is used here as on the other systems.
+/*
+ * Values for d_type:
+ *   DT_DIR      Not a reparse point and has FILE_ATTRIBUTE_DIRECTORY
+ *   DT_REG      Not a reparse point and doesn't have FILE_ATTRIBUTE_DIRECTORY
+ *   DT_LNK      A reparse point with the tag IO_REPARSE_TAG_SYMLINK
+ *   DT_SOCK     A reparse point with the tag IO_REPARSE_TAG_AF_UNIX
+ *   DT_UNKNOWN  All other reparse points
+ *   DT_FIFO     Not used (cannot appear in a directory listing)
+ *   DT_CHR      Not used (cannot appear in a directory listing)
+ *
+ * The constants are the same as on many POSIX systems and match the
+ * pattern DT_xxx == (S_IFxxx >> 12). However, as of Febrary 2025:
+ *   - S_IFLNK and S_IFSOCK don't exist in mingw-w64.
+ *   - S_IFBLK is a mingw-w64 extension with an unusual value 0x3000 instead
+ *     of the normal 0x6000, thus DT_BLK was omitted for now. Like DT_FIFO
+ *     and DT_CHR, DT_BLK wouldn't appear in a directory listing.
  */
 #define DT_UNKNOWN     0
 #define DT_FIFO                1
@@ -59,6 +55,7 @@ extern "C" {
 #define DT_DIR         4
 #define DT_REG         8
 #define DT_LNK         10
+#define DT_SOCK                12
 
 struct dirent
 {
@@ -67,7 +64,7 @@ struct dirent
        unsigned char   d_8dot3;        /* 0 or 1, see _readdir_8dot3. */
        unsigned short  d_reclen;       /* Size of this struct. */
        unsigned short  d_namlen;       /* Length of name in d_name. */
-       char            d_name[778]; /* [NAME_MAX + 1] */ /* File name. */
+       char            d_name[778]; /* [(MAX_PATH-1)*3+1] */ /* File name. */
 };
 
 typedef struct __dirent_DIR DIR;
@@ -156,7 +153,7 @@ int __cdecl __MINGW_NOTHROW closedir (DIR*);
  * or _readdir_8dot3.
  *
  * After rewinding, the next call to readdir, _readdir_8dot3, or _wreaddir
- * has to reopen the directory. It can fail, for example, if the directory
+ * will reopen the directory. It can fail, for example, if the directory
  * has been renamed, removed, or access permissions have changed.
  */
 void __cdecl __MINGW_NOTHROW rewinddir (DIR*);
@@ -173,20 +170,12 @@ long __cdecl __MINGW_NOTHROW telldir (DIR*);
  * _wreaddir call. A possible delayed EILSEQ set by an earlier call to
  * readdir or _readdir_8dot3 is preserved (this differs from rewinddir).
  *
- * If seeking back by more than one entry, the next call to readdir,
- * _readdir_8dot3, or _wreaddir has to reopen the directory. It can fail,
- * for example, if the directory has been renamed, removed, or access
- * permissions have changed. If reopening is successful, the readdir call
- * will try to reach the requested position. This produces expected results
- * only if the file system always returns the entries in the same order.
- *
- * If seeking back by exactly one entry, the next readdir, _readdir_8dot3,
- * or _wreaddir will try to return the previous entry again which is fast.
- * However, this is guaranteed to succeed only when using _wreaddir in the
- * sequence telldir ; _wreaddir ; seekdir ; _wreaddir. If readdir or
- * _readdir_8dot3 is used instead, a possible character set conversion
- * problem may result in entries to be skipped, and then the seekdir
- * step needs to seek backwards more than one entry.
+ * If seeking backwards, the next call to readdir, _readdir_8dot3, or
+ * _wreaddir might need to reopen the directory. It can fail, for example,
+ * if the directory has been renamed, removed, or access permissions
+ * have changed. If reopening is successful, readdir will try to reach
+ * the requested position. This produces expected results only if the
+ * file system always returns the entries in the same order.
  */
 void __cdecl __MINGW_NOTHROW seekdir (DIR*, long);
 
@@ -204,7 +193,7 @@ struct _wdirent
        unsigned char   d_8dot3;        /* Always zero. */
        unsigned short  d_reclen;       /* Size of this struct. */
        unsigned short  d_namlen;       /* Length of name in d_name. */
-       wchar_t         d_name[260]; /* [FILENAME_MAX] */ /* File name. */
+       wchar_t         d_name[260]; /* [MAX_PATH] */ /* File name. */
 };
 
 /*
-- 
2.48.1

_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to