Sorry,Since starting in Windows 10 version 1803 (10.0.17134.0), the
Universal C Runtime supports using a UTF-8 code page. Two bytes for every
wide character may not be enough,best to use wcstombs (NULL, wcstr, 0) to
get the size correctly.here is the patch.

傅继晗 <[email protected]> 于2023年3月23日周四 16:21写道:

> I was going to refactor these two functions, but after LIU Hao's
> explanation, I realized that I  just took it too easy, and  more thought
> should be given to windows than linux, so I gave up on refactoring and just
> fixed its bugs.
>  Even if the  locale was consistent with the input filename, it would
> still be like this problem  ,that is,non-ascii  filenames get truncated.
> After I debugged it, I found that the* len* variable for the third
> parameter of the `wcstombs` function(Line 99 in basename.c and Line 142 in
> dirname.c) has the value returned by `mbstowcs`. However,the *len* value
> returned by the `mbstowcs` function refers to the number of characters,
> while the third  parameter  of the wcstombs function refers to the size of
> the target buffer in bytes, as Microsoft mentions in the documentation
> <https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/wcstombs-wcstombs-l?view=msvc-170>
> "*In general, it isn't known how many bytes will be required when
> converting a wide-character string. Some wide characters will require only
> If there are 2 bytes in the multibyte output string for every wide
> character in the input string ( If there are 2 bytes in the multibyte
> output string for every wide character in the input string ( including the
> wide character NULL), the result is guaranteed to fit.*"
> I think it would be more appropriate for this third paramether to be twice
> the number of characters, or it would definitely cause truncation, as
> this bug shows <https://sourceforge.net/p/mingw-w64/bugs/227/>, for the
> multibyte character. Here is my patch and test case, just two line changes,
> thanks.
>
>
From 58a49c51fe42733d1ecd8059d41abf138f0d9d47 Mon Sep 17 00:00:00 2001
From: FunnyBiu <[email protected]>
Date: Thu, 23 Mar 2023 18:07:07 +0800
Subject: [PATCH] Add files via upload

 Even if the  locale was consistent with the input filename, it would still
be like this problem  ,that is,non-ascii  filenames get truncated. After I
debugged it, I found that the* len* variable for the third parameter of the
`wcstombs` function(Line 99 in basename.c and Line 142 in dirname.c) has
the value returned by `mbstowcs`. However,the *len* value returned by the
`mbstowcs` function refers to the number of characters, while the third
parameter  of the wcstombs function refers to the size of the target buffer
in bytes.I think it is best to use wcstombs (NULL, wcstr, 0) to get the size 
correctly.
---
 mingw-w64-crt/misc/basename.c | 3 ++-
 mingw-w64-crt/misc/dirname.c  | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/mingw-w64-crt/misc/basename.c b/mingw-w64-crt/misc/basename.c
index c45dbbb36..6464cc777 100644
--- a/mingw-w64-crt/misc/basename.c
+++ b/mingw-w64-crt/misc/basename.c
@@ -96,7 +96,8 @@ basename (char *path)
               * then we transform the full normalised path back into 
               * the multibyte character domain, and skip over the dirname,
               * to return the resolved basename.  */
-             if ((len = wcstombs( path, refcopy, len)) != (size_t)(-1))
+                 len= wcstombs( NULL, refcopy, 0);
+             if (( wcstombs( path, refcopy, len)) != (size_t)(-1))
                path[len] = '\0';
              *refname = L'\0';
              if ((len = wcstombs( NULL, refcopy, 0 )) != (size_t)(-1))
diff --git a/mingw-w64-crt/misc/dirname.c b/mingw-w64-crt/misc/dirname.c
index 9c5cf87db..ce3252575 100644
--- a/mingw-w64-crt/misc/dirname.c
+++ b/mingw-w64-crt/misc/dirname.c
@@ -139,7 +139,8 @@ dirname(char *path)
              /* finally ...
               * transform the resolved dirname back into the multibyte char 
domain,
               * restore the caller's locale, and return the resultant dirname. 
 */
-             if ((len = wcstombs( path, refcopy, len )) != (size_t)(-1))
+                 len= wcstombs( NULL, refcopy, 0);
+             if (wcstombs( path, refcopy ,len) != (size_t)(-1))
                path[len] = '\0';
            }
          else
#include <stdio.h>
#include <locale.h>
extern char * __cdecl basename (char *path);
extern char * __cdecl dirname (char *path);

void xprint(const char *s)
{
    while (*s)
        printf("\\x%02x", (int)(unsigned char)(*s++));
}

int main(int argc, char **argv)
{
    char input[] ={0xe6,0x98,0x9f,0x00};// UTF-8 encoding of 星
    char input2[] ={0x2f,0xca,0xab,0x79,0x2f,0xca,0xab,0x7a,0x00};// GBK 
encoding of /诗y/诗z
    //char input[] ={0xe8,0xaf,0x97,0x7a,0x00};// UTF-8 encoding of   诗z
    //char input2[] 
={0x2f,0xe8,0xaf,0x97,0x79,0x2f,0xe8,0xaf,0x97,0x7a,0x00};// UTF-8 encoding of 
/诗y/诗z
    char *output;
    printf("basename(\"");
    xprint(input);
    printf("\") = \"");
    output = basename(input);
    xprint(output);
    printf("\ndirname(\"");
    xprint(input2);
    printf("\") = \"");
    output = dirname(input2);
    xprint(output);
    printf("\"\n");
 
    return 0;
}

_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to