Re: [Mingw-w64-public] [PATCH] rewrite the dirname.c and basename.c without wide character processing

LIU Hao Sat, 25 Mar 2023 02:30:34 -0700

在 2023-03-25 12:35, Alvin Wong 写道:

Can we just avoid converting to wide char at all and operate only in MBCS? IsDBCSLeadByte should be enough to allow these functions to skip any false matches on the second byte of double-byte chars. And it does not matter that IsDBCSLeadByte doesn't work with UTF-8, because the UTF-8 encoding already ensures that there will be no false matches with 7-bit ASCII chars (all bytes forming multi-byte chars have the MSB set, unlike some DBCS).

While this argument is almost correct on its own (except that `IsDBCSLeadByteEx()` is preferred to `IsDBCSLeadByte()`), we should not declare these functions as working with UTF-8. As explained in a previous message, the Yen symbol (`¥`, two bytes in UTF-8: C2 A5) is a path separator in Japanese locales, and the Won symbol (`₩`, three bytes in UTF-8: E2 82 A9) is also a path separator in Korean locales; those are not something we can handle, because we can't know the encoding of the argument string.




--
Best regards,
LIU Hao

OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
Mingw-w64-public mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Re: [Mingw-w64-public] [PATCH] rewrite the dirname.c and basename.c without wide character processing

Reply via email to