On 2025-02-17 Pali Rohár wrote: > On Monday 17 February 2025 17:37:17 Lasse Collin wrote: > > I suppose lstat() shouldn't follow any reparse points. > > That is a question for which I do not have answer.
One would need to figure out what is the least surprising behavior. > Why should follow: > - ensure that lstat() and stat() returns same data for file type which > is not symbolic link (this is somehow what POSIX applications may > except) > - ensure that lstat() for mount points returns inode where mount point > points and not the inode of the underlying mount point itself (this > is again POSIX behavior) These are very good reasons. > Why should NOT follow: > - some reparse point, specially those of name surrogate type, > represents different file (possible remote) and lstat could return > the real local information > - reparse points for which driver is not installed cannot be opened in > "follow mode" If lstat() did follow and a driver is not installed, lstat() could fail with a custom errno value, that is, something else than ENOENT. Perhaps ENXIO could be OK. Maybe stat() could do the same. > > Maybe there could > > be non-POSIX macros like S_IFRPP and S_ISRPP(m) to indicate any > > other reparse point than symlink or socket. Some OSes have extra > > macros but I don't know if the idea makes sense here. > > Unless there is a real needs from applications, I would rather not add > any new non-POSIX/non-GNU macros. Right, better to not add anything special until there's a clear need for it. If something new was introduced, one variation is to make lstat() follow reparse points except symlinks and sockets, and add _rppstat() that returns S_IFRPP for other reparse points. The idea would be that POSIXish apps that care about reparse points could be adapted with smaller changes. I definitely am not suggesting to do this, I'm just thinking out aloud. :-) > > If UCRT added S_IFLNK and S_IFSOCK but used different constants than > > mingw-w64, that would be a mess. I'm not proposing any S_ macro > > additions in this email, I'm just thinking out aloud. > > IIRC UCRT has not added support yet. But sure it is not a good idea to > add something incompatible. Microsoft already uses the same values for S_IFxxx as everyone else. It would be a big surprise if they did differently, for example, with S_IFSOCK. If they added S_IFBLK, there's a small chance that they could look at mingw-w64, and then adopt the weird 0x3000. I wonder how much breakage it would cause if S_IFBLK was changed in mingw-w64, and if that risk is worth it to get rid of this small portability trap. I *guess* that both negative and postive effects cannot affect many packages. I quickly tried to find examples, and didn't find many. The tcl package has ucrt64/include/tcl8.6/tcl-private/win/tclWinPort.h which defines S_IFBLK to 0x6000 if S_IFBLK isn't already defined. It was added in 2016. The comment refers to "Linux and MingW" but the author might have missed the odd value. Thus MSVC and mingw-w64 built tcl use different S_IFBLK value. https://packages.msys2.org/packages/mingw-w64-ucrt-x86_64-tcl https://github.com/tcltk/tcl/commit/48e3873f199817b52d54a4d802966b508917d200 Python's stat.S_IFBLK and stat.S_ISBLK(mode) use the values from <sys/stat.h>. Thus, MSYS2-Python uses 0x6000 and UCRT64-Python uses 0x3000. I suppose MSVC-Python uses 0x6000 because it's the fallback value when S_IFBLK isn't already defined. The comment about the constants suggests that no one noticed that mingw-w64 has an unusual value: https://github.com/python/cpython/blob/b93b7e566e5a4efe7f077af2083140e50bd2b08f/Modules/_stat.c#L55 0x6000 is used in stat.py too, but due to the import at the bottom, the values from _stat.c are used instead. https://github.com/python/cpython/blob/b93b7e566e5a4efe7f077af2083140e50bd2b08f/Lib/stat.py#L33 You can try the following Python code: import os import stat mode = os.stat(".").st_mode print("0x%x" % mode) print(stat.S_ISDIR(mode)) print(stat.S_ISBLK(mode)) mode = stat.S_IFBLK print("0x%x" % mode) print(stat.S_ISBLK(mode)) I'm not brave enough to actually propose that mingw-w64 should change S_IFBLK to 0x6000, but I still wonder if keeping the current value does more bad than good. > I would propose: Do not add DT_BLK for now. And once we figure out > that it is really useful, we can add it. So we do not need to think > which value DT_BLK should have (unless there is no doubt that it > should have numeric value XXXX). I agree. It shouldn't be a disaster if S_IFBLK and DT_BLK values aren't in sync, but since DT_BLK isn't actually required, it's better to not define it for now. > > Are you certain that IO_REPARSE_TAG_AF_UNIX is the right tag for > > Win32 AF_UNIX? > > Yes, I'm sure. Thanks! :-) > > - If adding DT_SOCK, is this OK: > > > > static unsigned char > > get_d_type (DWORD attrs, DWORD reparse_tag) > > { > > if (attrs & FILE_ATTRIBUTE_REPARSE_POINT) > > { > > switch (reparse_tag) > > { > > case IO_REPARSE_TAG_SYMLINK: > > return DT_LNK; > > > > case IO_REPARSE_TAG_AF_UNIX: > > return DT_SOCK; > > > > default: > > return DT_UNKNOWN; > > } > > } > > > > return (attrs & FILE_ATTRIBUTE_DIRECTORY) ? DT_DIR : DT_REG; > > } > > For me this looks good. Thanks! I attached another fixup patch. I shortened seekdir doc in dirent.h a bit too. It doesn't need to be quite so specific about the implementation details. > > I think NAME_MAX should be increased at least if modern MSVC doesn't > > define it. Making NAME_MAX visible with _POSIX_C_SOURCE etc. should > > be simple, one just needs to be careful that all relevant macros > > are listed correctly. > > I do not know about this one point. I have changed my mind. Let's not touch NAME_MAX. It might even be good to keep it undefined as it is now (I suppose very few apps define the non-standard _POSIX_ macro). (1) Apps can misuse NAME_MAX. For example, in an earlier email I linked to binutils bug about NAME_MAX and mingw-w64. Seems that binutils uses NAME_MAX in exactly one place, and there it assumes that any filename up to NAME_MAX ASCII chars can be *created*. If other apps have similar assumptions, increasing NAME_MAX can create problems. https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=binutils/dlltool.c;h=ef21423b06d6b30956dd44adee9df8325bb48aa0;hb=HEAD#l4002 (2) Even if NAME_MAX was defined on a system, there might not be any specific limit in reality. It's one reason why readdir_r() is deprecated. For example, NAME_MAX is 255 on Linux, but on FAT32 mounted with iocharset=utf8, one can have 765 bytes long filenames because the limit is 255 UTF-16 code units. Most apps don't care about NAME_MAX, thus such names work normally. Linux actually claims 2 * 765 for NAME_MAX on FAT32: $ getconf NAME_MAX /mnt 1530 I don't think it's possible on FAT32. It's just a big enough value calculated as FAT_LFN_LEN * NLS_MAX_CHARSET_SIZE. (3) In POSIX.1-2024, the second paragraph under "Path Variable Names" says that macros in <limits.h> shall be omitted in some cases. That doesn't really apply to the case in mingw-w64, and there's no pathconf() in mingw-w64, but the point is that apps shouldn't assume that NAME_MAX exists. https://pubs.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html#tag_13_24_03_02 > Some more information. Visual C++ CRT file stdio.h defines following > constant: > > #define FILENAME_MAX 260 > > And it defines it since the first Visual C++ 1.0 version for 32-bit NT > up to the last version with UCRT support. > > So I guess that value 260 comes from this FILENAME_MAX macro. > > But what is point of FILENAME_MAX I have no idea, it is not referenced > or used in CRT header files. FILENAME_MAX is at least in C99 and newer standards. Its definition is kind of relaxed. See C99 7.19.1 or C11/C17/C23 7.21.1. Defining FILENAME_MAX to the same value as MAX_PATH sounds sensible because MAX_PATH or _MAX_PATH is a limit in a few CRT APIs. Size of d_name is derived from WIN32_FIND_DATAW.cFileName[MAX_PATH] in the new dirent code (_finddata_t.filename[_MAX_PATH] in the old code). I suppose it's more logical to refer to MAX_PATH in the dirent.h comments. -- Lasse Collin
From 0bc73b9ae3dc02efa733ec45230315dd1a0c06a2 Mon Sep 17 00:00:00 2001 From: Lasse Collin <lasse.col...@tukaani.org> Date: Tue, 18 Feb 2025 17:38:11 +0200 Subject: [PATCH 9/9] ... Reparse point tweaks: DT_LNK, DT_SOCK, DT_UNKNOWN --- mingw-w64-crt/misc/dirent.c | 34 +++++++++------- mingw-w64-headers/crt/dirent.h | 71 ++++++++++++++-------------------- 2 files changed, 49 insertions(+), 56 deletions(-) diff --git a/mingw-w64-crt/misc/dirent.c b/mingw-w64-crt/misc/dirent.c index f11e2db83..90b5d2b08 100644 --- a/mingw-w64-crt/misc/dirent.c +++ b/mingw-w64-crt/misc/dirent.c @@ -382,21 +382,25 @@ opendir (const char *path) static unsigned char get_d_type (DWORD attrs, DWORD reparse_tag) { - /* Check for a reparse point before checking if it is a directory. - * A reparse point can also have the directory attribute. This - * includes "directory symlinks" too. */ - if ((attrs & FILE_ATTRIBUTE_REPARSE_POINT) && - reparse_tag == IO_REPARSE_TAG_SYMLINK) - { - return DT_LNK; - } - - if (attrs & FILE_ATTRIBUTE_DIRECTORY) - { - return DT_DIR; - } - - return DT_REG; + /* Check for a reparse point before checking if it is a directory. + * A reparse point can also have the directory attribute. This + * includes "directory symlinks" too. */ + if (attrs & FILE_ATTRIBUTE_REPARSE_POINT) + { + switch (reparse_tag) + { + case IO_REPARSE_TAG_SYMLINK: + return DT_LNK; + + case IO_REPARSE_TAG_AF_UNIX: + return DT_SOCK; + + default: + return DT_UNKNOWN; + } + } + + return (attrs & FILE_ATTRIBUTE_DIRECTORY) ? DT_DIR : DT_REG; } diff --git a/mingw-w64-headers/crt/dirent.h b/mingw-w64-headers/crt/dirent.h index 06d95650a..7a27f725c 100644 --- a/mingw-w64-headers/crt/dirent.h +++ b/mingw-w64-headers/crt/dirent.h @@ -22,36 +22,32 @@ extern "C" { #endif -/* In addition to the d_ino and d_name members required by POSIX, +/* + * In addition to the d_ino and d_name members required by POSIX, * also d_type, d_8dot3, d_reclen, and d_namlen are present in - * struct dirent and struct _wdirent. */ + * struct dirent and struct _wdirent. + */ #define _DIRENT_HAVE_D_TYPE 1 #define _DIRENT_HAVE_D_8DOT3 1 #define _DIRENT_HAVE_D_RECLEN 1 #define _DIRENT_HAVE_D_NAMLEN 1 -/* Values for d_type: - * - * - DT_DIR and DT_REG are used when the entry is a directory or - * a regular file. - * - * - DT_LNK is only used for reparse points that have the tag - * IO_REPARSE_TAG_SYMLINK. Other reparse points are DT_DIR or DT_REG. - * - * - Other values aren't possible in a directory listing. They are only - * defined for compatibility. - * - * The constants match glibc and *BSDs: - * - * - DT_FIFO == (S_IFIFO >> 12) - * DT_CHR == (S_IFCHR >> 12) - * DT_DIR == (S_IFDIR >> 12) - * DT_REG == (S_IFREG >> 12) - * - * - There is no S_IFLNK on Windows. However, on other systems, - * S_IFLNK == (S_IFREG | S_IFCHR), and S_IFCHR has the same - * value in glibc, *BSDs, and Windows. Thus the same DT_LNK - * constant is used here as on the other systems. +/* + * Values for d_type: + * DT_DIR Not a reparse point and has FILE_ATTRIBUTE_DIRECTORY + * DT_REG Not a reparse point and doesn't have FILE_ATTRIBUTE_DIRECTORY + * DT_LNK A reparse point with the tag IO_REPARSE_TAG_SYMLINK + * DT_SOCK A reparse point with the tag IO_REPARSE_TAG_AF_UNIX + * DT_UNKNOWN All other reparse points + * DT_FIFO Not used (cannot appear in a directory listing) + * DT_CHR Not used (cannot appear in a directory listing) + * + * The constants are the same as on many POSIX systems and match the + * pattern DT_xxx == (S_IFxxx >> 12). However, as of Febrary 2025: + * - S_IFLNK and S_IFSOCK don't exist in mingw-w64. + * - S_IFBLK is a mingw-w64 extension with an unusual value 0x3000 instead + * of the normal 0x6000, thus DT_BLK was omitted for now. Like DT_FIFO + * and DT_CHR, DT_BLK wouldn't appear in a directory listing. */ #define DT_UNKNOWN 0 #define DT_FIFO 1 @@ -59,6 +55,7 @@ extern "C" { #define DT_DIR 4 #define DT_REG 8 #define DT_LNK 10 +#define DT_SOCK 12 struct dirent { @@ -67,7 +64,7 @@ struct dirent unsigned char d_8dot3; /* 0 or 1, see _readdir_8dot3. */ unsigned short d_reclen; /* Size of this struct. */ unsigned short d_namlen; /* Length of name in d_name. */ - char d_name[778]; /* [NAME_MAX + 1] */ /* File name. */ + char d_name[778]; /* [(MAX_PATH-1)*3+1] */ /* File name. */ }; typedef struct __dirent_DIR DIR; @@ -156,7 +153,7 @@ int __cdecl __MINGW_NOTHROW closedir (DIR*); * or _readdir_8dot3. * * After rewinding, the next call to readdir, _readdir_8dot3, or _wreaddir - * has to reopen the directory. It can fail, for example, if the directory + * will reopen the directory. It can fail, for example, if the directory * has been renamed, removed, or access permissions have changed. */ void __cdecl __MINGW_NOTHROW rewinddir (DIR*); @@ -173,20 +170,12 @@ long __cdecl __MINGW_NOTHROW telldir (DIR*); * _wreaddir call. A possible delayed EILSEQ set by an earlier call to * readdir or _readdir_8dot3 is preserved (this differs from rewinddir). * - * If seeking back by more than one entry, the next call to readdir, - * _readdir_8dot3, or _wreaddir has to reopen the directory. It can fail, - * for example, if the directory has been renamed, removed, or access - * permissions have changed. If reopening is successful, the readdir call - * will try to reach the requested position. This produces expected results - * only if the file system always returns the entries in the same order. - * - * If seeking back by exactly one entry, the next readdir, _readdir_8dot3, - * or _wreaddir will try to return the previous entry again which is fast. - * However, this is guaranteed to succeed only when using _wreaddir in the - * sequence telldir ; _wreaddir ; seekdir ; _wreaddir. If readdir or - * _readdir_8dot3 is used instead, a possible character set conversion - * problem may result in entries to be skipped, and then the seekdir - * step needs to seek backwards more than one entry. + * If seeking backwards, the next call to readdir, _readdir_8dot3, or + * _wreaddir might need to reopen the directory. It can fail, for example, + * if the directory has been renamed, removed, or access permissions + * have changed. If reopening is successful, readdir will try to reach + * the requested position. This produces expected results only if the + * file system always returns the entries in the same order. */ void __cdecl __MINGW_NOTHROW seekdir (DIR*, long); @@ -204,7 +193,7 @@ struct _wdirent unsigned char d_8dot3; /* Always zero. */ unsigned short d_reclen; /* Size of this struct. */ unsigned short d_namlen; /* Length of name in d_name. */ - wchar_t d_name[260]; /* [FILENAME_MAX] */ /* File name. */ + wchar_t d_name[260]; /* [MAX_PATH] */ /* File name. */ }; /* -- 2.48.1
_______________________________________________ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public