More experiments revealed more inconsistencies.In Windows explorer, `\\host\share\doc\..\..` goes to a page listing all share names on the host, `\\host\share\doc\..\..\..` goes to Network, and `\\host\share\doc\..\..\..\..` goes to Desktop.
However in CMD and PowerShell, `\\host\share\doc\..\..` stops at `\\host\share\`.According to Microsoft documentation about paths, which I linked yesterday, the `\\host\share` part is the name of a volume, so I think only the CMD behavior is right: `dirname()` should not remove `..` which would move to a different volume. And here is the alternative patch.
-- Best regards, LIU Hao
From c4f11d50c9abef455ac4ef334d7fa89de929a316 Mon Sep 17 00:00:00 2001 From: LIU Hao <[email protected]> Date: Sun, 26 Mar 2023 02:02:52 +0800 Subject: [PATCH] crt: Reimplement `dirname()` and `basename()` It is necessary to re-implement these two functions because 1. They used to change the global locale and were subject to races with almost all stdio functions. 2. The previous `basename()` had a VLA and might effect stack overflows if the argument path was too long. 3. They used to produce erroneous results if the argument path was not in the default ANSI code page. (I don't think this is a bug though, just a design flaw.) According to Microsoft documentation about `fopen()` [1], paths are interpreted with `CP_ACP` if `AreFileApisANSI()` returns true, and `CP_OEMCP` otherwise. We had better follow that convention. UNC-ized DOS paths should also be handled, but they cannot be relative [2]. [1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170 [2] https://learn.microsoft.com/en-us/dotnet/standard/io/file-path-formats#identify-the-path Signed-off-by: LIU Hao <[email protected]> --- mingw-w64-crt/misc/dirname.c | 281 +++++++++++++++++++++++++++++++++++ 1 file changed, 281 insertions(+) diff --git a/mingw-w64-crt/misc/dirname.c b/mingw-w64-crt/misc/dirname.c index e69de29bb..d43509c50 100644 --- a/mingw-w64-crt/misc/dirname.c +++ b/mingw-w64-crt/misc/dirname.c @@ -0,0 +1,281 @@ +/** + * This file has no copyright assigned and is placed in the Public Domain. + * This file is part of the mingw-w64 runtime package. + * No warranty is given; refer to the file DISCLAIMER.PD within this package. + */ +#ifndef WIN32_LEAN_AND_MEAN +#define WIN32_LEAN_AND_MEAN +#endif +#include <stdlib.h> +#include <libgen.h> +#include <windows.h> + +/* A 'directory separator' is a byte that equals 0x2F ('solidus' or more + * commonly 'forward slash') or 0x5C ('reverse solidus' or more commonly + * 'backward slash'). The byte 0x5C may look different from a backward slash + * in some locales; for example, it looks the same as a Yen sign in Japanese + * locales and a Won sign in Korean locales. Despite its appearance, it still + * functions as a directory separator. + * + * A 'DOS path' comprises an optional DOS drive letter with a colon, and then + * an arbitrary number of possibily empty components, separated by non-empty + * sequences of directory separators (in other words, consecutive directory + * separators are treated as a single one). A path that comprises an empty + * component denotes the current working directory. + * + * An 'absolute path' comprises at least two components, the first of which + * is empty. + * + * A 'relative path' is a path that is not an absolute path. In other words, + * it either comprises an empty component, or begins with a non-empty + * component. + * + * POSIX doesn't have a concept about DOS drives. A path that does not have a + * drive letter starts from the same drive as the current working directory. + * UNC paths are handled as if the `\\host-name\share-name` part was a DOS + * drive. + * + * For example: + * (Examples without drive letters match POSIX.) + * + * Argument dirname() returns basename() returns + * -------- ----------------- ------------------ + * `` or NULL `.` `.` + * `usr` `.` `usr` + * `usr\` `.` `usr` + * `\` `\` `\` + * `\usr` `\` `usr` + * `\usr\lib` `\usr` `lib` + * `\home\\dwc\\test` `\home\\dwc` `test` + * `C:` `C:.` `.` + * `C:usr` `C:.` `usr` + * `C:usr\` `C:.` `usr` + * `C:\` `C:\` `\` + * `C:\\` `C:\` `\` + * `C:\\\` `C:\` `\` + * `C:\usr` `C:\` `usr` + * `C:\usr\lib` `C:\usr` `lib` + * `C:\\usr\\lib\\` `C:\\usr` `lib` + * `C:\home\\dwc\\test` `C:\home\\dwc` `test` + * `\\host\usr` `\\host\usr\` `\` + * `\\host\usr\lib` `\\host\usr\` `lib` + */ + +struct path_info + { + /* This points to end of the UNC prefix and drive letter, if any. */ + char* prefix_end; + + /* These point to the directory separator in front of the last non-empty + * component. */ + char* base_sep_begin; + char* base_sep_end; + + /* This points to the last directory separator sequence if no other + * non-separator characters follow it. */ + char* term_sep_begin; + + /* This points to the end of the string. */ + char* path_end; + }; + +#define IS_DIR_SEP(c) ((c) == '/' || (c) == '\\') + +static +void +do_get_path_info(struct path_info* info, char* path) + { + char* pos = path; + DWORD cp; + int dbcs_tb, sep_cnt, dir_sep, dos_dev; + + /* Get the code page for paths in the same way as `fopen()`. */ + cp = AreFileApisANSI() ? CP_ACP : CP_OEMCP; + + /* Set the structure to 'no data'. */ + info->prefix_end = NULL; + info->base_sep_begin = NULL; + info->base_sep_end = NULL; + info->term_sep_begin = NULL; + + /* Check for a UNC prefix. */ + if(IS_DIR_SEP(pos[0]) && IS_DIR_SEP(pos[1])) { + pos += 2; + info->prefix_end = pos; + + /* Seek to the end of the share name. */ + dbcs_tb = 0; + sep_cnt = 0; + + while(*pos != 0) { + dir_sep = 0; + + if(dbcs_tb) + dbcs_tb = 0; + else if(IsDBCSLeadByteEx(cp, *pos)) + dbcs_tb = 1; + else + dir_sep = IS_DIR_SEP(*pos); + + if(dir_sep && ++ sep_cnt == 2) + break; + + pos ++; + } + + if(*pos == 0) { + /* Stop here anyway. The path is incomplete and results probably make + * no sense. */ + info->prefix_end = pos; + info->path_end = pos; + return; + } + + /* Host name terminates here. The terminating directory separator is + * not part of the prefix, and initiates a new absolute path. */ + info->prefix_end = pos; + } + + /* A DOS drive letter may follow a `\\.\` or `\\?\` prefix in a UNC path, + * or initiate a non-UNC path. */ + dos_dev = 0; + + if(pos - path == 3 && (path[2] == '.' || path[2] == '?')) { + pos ++; + dos_dev = 1; + } + else if(pos == path) + dos_dev = 1; + + if(dos_dev && ((pos[0] >= 'A' && pos[0] <= 'Z') + || (pos[0] >= 'a' && pos[0] <= 'z')) && pos[1] == ':') { + pos += 2; + info->prefix_end = pos; + } + + /* The remaining part of the path is almost the same as POSIX. */ + dbcs_tb = 0; + + while(*pos != 0) { + dir_sep = 0; + + if(dbcs_tb) + dbcs_tb = 0; + else if(IsDBCSLeadByteEx(cp, *pos)) + dbcs_tb = 1; + else + dir_sep = IS_DIR_SEP(*pos); + + /* If a separator has been encountered and the previous character + * was not, mark this as the beginning of the terminating separator + * sequence. */ + if(dir_sep && !info->term_sep_begin) + info->term_sep_begin = pos; + + /* If a non-separator character has been encountered and a previous + * terminating separator sequence exists, start a new component. */ + if(!dir_sep && info->term_sep_begin) { + info->base_sep_begin = info->term_sep_begin; + info->base_sep_end = pos; + info->term_sep_begin = NULL; + } + + pos ++; + } + + /* Stores the end of the path for convenience. */ + info->path_end = pos; + } + +char* +dirname(char* path) + { + struct path_info info; + char* upath; + const char* top; + static char* static_path_copy; + + if(path == NULL|| path[0] == 0) + return (char*) "."; + + do_get_path_info(&info, path); + upath = info.prefix_end ? info.prefix_end : path; + top = (IS_DIR_SEP(path[0]) || IS_DIR_SEP(upath[0])) ? "\\" : "."; + + /* If a non-terminating directory separator exists, it terminates the + * dirname. Truncate the path there. */ + if(info.base_sep_begin) { + info.base_sep_begin[0] = 0; + + /* If the unprefixed path has not been truncated to empty, it is now + * the dirname, so return it. */ + if(upath[0]) + return path; + } + + /* The dirname is empty. In principle we return `<prefix>.` if the + * path is relative and `<prefix>\` if it is absolute. This can be + * optimized if there is no prefix. */ + if(upath == path) + return (char*) top; + + /* When there is a prefix, we must append a character to the prefix. + * If there is enough room in the original path, we just reuse its + * storage. */ + if(upath != info.path_end) { + upath[0] = *top; + upath[1] = 0; + return path; + } + + /* This is only the last resort. If there is no room, we have to copy + * the prefix elsewhere. */ + upath = realloc(static_path_copy, info.prefix_end - path + 2); + if(!upath) + return (char*) top; + + static_path_copy = upath; + memcpy(upath, path, info.prefix_end - path); + upath += info.prefix_end - path; + upath[0] = *top; + upath[1] = 0; + return static_path_copy; + } + +char* +basename(char* path) + { + struct path_info info; + char* upath; + + if(path == NULL) + return (char*) "."; + + do_get_path_info(&info, path); + upath = info.prefix_end ? info.prefix_end : path; + + /* If the path is non-UNC and empty, then it's relative. POSIX says '.' + * shall be returned. */ + if(IS_DIR_SEP(path[0]) == 0 && upath[0] == 0) + return (char*) "."; + + /* If a terminating separator sequence exists, it is not part of the + * name and shall be truncated. */ + if(info.term_sep_begin) + info.term_sep_begin[0] = 0; + + /* If some other separator sequence has been found, the basename + * immediately follows it. */ + if(info.base_sep_end) + return info.base_sep_end; + + /* If removal of the terminating separator sequence has caused the + * unprefixed path to become empty, it must have comprised only + * separators. POSIX says `/` shall be returned, but on Windows, we + * return `\` instead. */ + if(upath[0] == 0) + return (char*) "\\"; + + /* Return the unprefixed path. */ + return upath; + } -- 2.34.1
OpenPGP_signature
Description: OpenPGP digital signature
_______________________________________________ Mingw-w64-public mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
