yuja added a comment.
> - # Since Python 3 converts argv to wchar_t type by Py_DecodeLocale() on Unix, > - # we can use os.fsencode() to get back bytes argv. > - # > - # https://hg.python.org/cpython/file/v3.5.1/Programs/python.c#l55 > - # > - # On Windows, the native argv is unicode and is converted to MBCS bytes > - # since we do enable the legacy filesystem encoding. if getattr(sys, 'argv', None) is not None: > - sysargv = list(map(os.fsencode, sys.argv)) > > + # On POSIX, the char** argv array is converted to Python str using > + # Py_DecodeLocale(). The inverse of this is Py_EncodeLocale(), which isn't > + # directly callable from Python code. So, we need to emulate it. > + # Py_DecodeLocale() calls mbstowcs() and falls back to mbrtowc() with > + # surrogateescape error handling on failure. These functions take the > + # current system locale into account. So, the inverse operation is to > + # .encode() using the system locale's encoding and using the > + # surrogateescape error handler. The only tricky part here is getting > + # the system encoding correct, since `locale.getlocale()` can return > + # None. We fall back to the filesystem encoding if lookups via `locale` > + # fail, as this seems like a reasonable thing to do. > + # > + # On Windows, the wchar_t **argv is passed into the interpreter as-is. > + # Like POSIX, we need to emulate what Py_EncodeLocale() would do. But > + # there's an additional wrinkle. What we really want to access is the > + # ANSI codepage representation of the arguments, as this is what > + # `int main()` would receive if Python 3 didn't define `int wmain()` > + # (this is how Python 2 worked). To get that, we encode with the mbcs > + # encoding, which will pass CP_ACP to the underlying Windows API to > + # produce bytes. > + if os.name == r'nt': > + sysargv = [a.encode("mbcs", "ignore") for a in sys.argv] On Windows, my assumption was os.fsencode() == .encode("mbcs") if sys._enablelegacywindowsfsencoding(). So this looks good to me. Perhaps, the "ignore" error mode would match the legacy Windows behavior. > + else: > + encoding = ( > + locale.getlocale()[1] > + or locale.getdefaultlocale()[1] > + or sys.getfilesystemencoding() > + ) > + sysargv = [a.encode(encoding, "surrogateescape") for a in sys.argv] I'm not pretty sure if the locale encoding is the encoding Py_DecodeLocale() would use. There are many ifdefs for `__APPLE__`. The doc says use `os.fsencode()`, but that's no longer valid (or wrong from the start)? https://docs.python.org/3/library/sys.html#sys.argv Something might be changed around 3.7 or 3.8. Since bytes argv handling has been moved from `int main()` to `preconfig.c`, things could become more dynamic. But I don't know. Just my guess. Overall, the new code looks good, but I have no idea if that's more correct. REPOSITORY rHG Mercurial CHANGES SINCE LAST ACTION https://phab.mercurial-scm.org/D8337/new/ REVISION DETAIL https://phab.mercurial-scm.org/D8337 To: indygreg, #hg-reviewers Cc: yuja, mercurial-devel _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel