I was going to refactor these two functions, but after LIU Hao's
explanation, I realized that I  just took it too easy, and  more thought
should be given to windows than linux, so I gave up on refactoring and just
fixed its bugs.
 Even if the  locale was consistent with the input filename, it would still
be like this problem  ,that is,non-ascii  filenames get truncated. After I
debugged it, I found that the* len* variable for the third parameter of the
`wcstombs` function(Line 99 in basename.c and Line 142 in dirname.c) has
the value returned by `mbstowcs`. However,the *len* value returned by the
`mbstowcs` function refers to the number of characters, while the third
parameter  of the wcstombs function refers to the size of the target buffer
in bytes, as Microsoft mentions in the documentation
<https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/wcstombs-wcstombs-l?view=msvc-170>
"*In general, it isn't known how many bytes will be required when
converting a wide-character string. Some wide characters will require only
If there are 2 bytes in the multibyte output string for every wide
character in the input string ( If there are 2 bytes in the multibyte
output string for every wide character in the input string ( including the
wide character NULL), the result is guaranteed to fit.*"
I think it would be more appropriate for this third paramether to be twice
the number of characters, or it would definitely cause truncation, as this
bug shows <https://sourceforge.net/p/mingw-w64/bugs/227/>, for the
multibyte character. Here is my patch and test case, just two line changes,
thanks.
#include <stdio.h>
#include <locale.h>
extern char * __cdecl basename (char *path);
extern char * __cdecl dirname (char *path);

void xprint(const char *s)
{
    while (*s)
        printf("\\x%02x", (int)(unsigned char)(*s++));
}

int main(int argc, char **argv)
{
    char input[] ={0xca,0xab,0x7a,0x00};// GBK encoding of   诗z
    char input2[] ={0x2f,0xca,0xab,0x79,0x2f,0xca,0xab,0x7a,0x00};// GBK 
encoding of /诗y/诗z
    //char input[] ={0xe8,0xaf,0x97,0x7a,0x00};// UTF-8 encoding of   诗z
    //char input2[] 
={0x2f,0xe8,0xaf,0x97,0x79,0x2f,0xe8,0xaf,0x97,0x7a,0x00};// UTF-8 encoding of 
/诗y/诗z
    char *output;
    printf("basename(\"");
    xprint(input);
    printf("\") = \"");
    output = basename(input);
    xprint(output);
    printf("\ndirname(\"");
    xprint(input2);
    printf("\") = \"");
    output = dirname(input2);
    xprint(output);
    printf("\"\n");
 
    return 0;
}

From 9972d980d7e7c0b7afdbc2b9833fca4cd1e492e7 Mon Sep 17 00:00:00 2001
From: FunnyBiu <[email protected]>
Date: Thu, 23 Mar 2023 15:55:26 +0800
Subject: [PATCH] fix truncation

---
 mingw-w64-crt/misc/basename.c | 2 +-
 mingw-w64-crt/misc/dirname.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mingw-w64-crt/misc/basename.c b/mingw-w64-crt/misc/basename.c
index c45dbbb36..23e5abba1 100644
--- a/mingw-w64-crt/misc/basename.c
+++ b/mingw-w64-crt/misc/basename.c
@@ -96,7 +96,7 @@ basename (char *path)
               * then we transform the full normalised path back into 
               * the multibyte character domain, and skip over the dirname,
               * to return the resolved basename.  */
-             if ((len = wcstombs( path, refcopy, len)) != (size_t)(-1))
+             if ((len = wcstombs( path, refcopy, 2*len)) != (size_t)(-1))
                path[len] = '\0';
              *refname = L'\0';
              if ((len = wcstombs( NULL, refcopy, 0 )) != (size_t)(-1))
diff --git a/mingw-w64-crt/misc/dirname.c b/mingw-w64-crt/misc/dirname.c
index 9c5cf87db..6f7774948 100644
--- a/mingw-w64-crt/misc/dirname.c
+++ b/mingw-w64-crt/misc/dirname.c
@@ -139,7 +139,7 @@ dirname(char *path)
              /* finally ...
               * transform the resolved dirname back into the multibyte char 
domain,
               * restore the caller's locale, and return the resultant dirname. 
 */
-             if ((len = wcstombs( path, refcopy, len )) != (size_t)(-1))
+             if ((len = wcstombs( path, refcopy, 2*len )) != (size_t)(-1))
                path[len] = '\0';
            }
          else
_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to