在 2023-03-25 12:35, Alvin Wong 写道:
Can we just avoid converting to wide char at all and operate only in MBCS? IsDBCSLeadByte should be enough to allow these functions to skip any false matches on the second byte of double-byte chars. And it does not matter that IsDBCSLeadByte doesn't work with UTF-8, because the UTF-8 encoding already ensures that there will be no false matches with 7-bit ASCII chars (all bytes forming multi-byte chars have the MSB set, unlike some DBCS).
While this argument is almost correct on its own (except that `IsDBCSLeadByteEx()` is preferred to `IsDBCSLeadByte()`), we should not declare these functions as working with UTF-8. As explained in a previous message, the Yen symbol (`¥`, two bytes in UTF-8: C2 A5) is a path separator in Japanese locales, and the Won symbol (`₩`, three bytes in UTF-8: E2 82 A9) is also a path separator in Korean locales; those are not something we can handle, because we can't know the encoding of the argument string.
-- Best regards, LIU Hao
OpenPGP_signature
Description: OpenPGP digital signature
_______________________________________________ Mingw-w64-public mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mingw-w64-public
