On 25/3/2023 17:28, LIU Hao wrote:
在 2023-03-25 12:35, Alvin Wong 写道:
Can we just avoid converting to wide char at all and operate only in
MBCS? IsDBCSLeadByte should be enough to allow these functions to
skip any false matches on the second byte of double-byte chars. And
it does not matter that IsDBCSLeadByte doesn't work with UTF-8,
because the UTF-8 encoding already ensures that there will be no
false matches with 7-bit ASCII chars (all bytes forming multi-byte
chars have the MSB set, unlike some DBCS).
While this argument is almost correct on its own (except that
`IsDBCSLeadByteEx()` is preferred to `IsDBCSLeadByte()`), we should
not declare these functions as working with UTF-8. As explained in a
previous message, the Yen symbol (`¥`, two bytes in UTF-8: C2 A5) is a
path separator in Japanese locales, and the Won symbol (`₩`, three
bytes in UTF-8: E2 82 A9) is also a path separator in Korean locales;
This claim needs to be verified. The native path separator on Windows
should be only U+005C (with APIs also accepting U+002F). While both
U+005C and U+00A5 translates to 0x5C in CP932, Windows uses Unicode to
handle files and NTFS uses Unicode file names. If you give Windows the
path `L"C:\134new\245folder"`, I can't really imagine it referring to
`C:\new\folder` rather than `C:\new¥folder` when system code page is in
Japanese. Of course, if you first translate the path to CP932, or if you
are using a program that does not use the Unicode Windows APIs, then you
will not be able to refer to `new¥folder`.
I think the following things need to be checked:
1. From Windows Explorer, can you create a file or folder containing
U+00A5 in its name on Japanese Windows? (Don't try from cmd.exe.)
2. If you create a file or folder containing U+00A5 on an NTFS volume
from another non-Japanese system, can you access it from Windows
Explorer on Japanese Windows?
3. Create the path `C:\new\folder` and try to access it using the
Unicode Windows API with the path `L"C:\134new\245folder"`.
4. Create the path `C:\new¥folder` (with U+00A5) and try the same.
5. Check the above two points, but with embedded manifest setting the
active code page to UTF-8, and using the "-A" APIs with a UTF-8
string instead.
6. Check whether MultiByteToWideChar converts 0x5C from CP932 to U+005C
or U+00A5.
Remember that U+005C and U+00A5 can look exactly the same in the
Japanese font on Windows, so you should verify you have the correct code
point when testing.
those are not something we can handle, because we can't know the
encoding of the argument string.
We can check the value of `GetACP()`, although I am not convinced we
need to.
_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public