Eryk Sun <eryk...@gmail.com> added the comment:

> How about treating only UTF-8 and leave legacy environment as-is?
> * When GetConsoleCP() returns CP_UTF8, use UTF-8 for stdin. 
> Otherwise, use ANSI.

Okay, and also when GetConsoleCP() fails because there's no console (e.g. 
python.exe w/ DETACHED_PROCESS creation flag, or pythonw.exe). 

However, using UTF-8 for the input code page is currently broken in many cases, 
so it should not be promoted as a recommended solution until Microsoft fixes 
their broken code (which should have been fixed 20 years ago; it's ridiculous). 
Legacy console applications rely on ReadFile and ReadConsoleA. Setting the 
input code page to UTF-8 is limited to reading 7-bit ASCII (ordinals 0-127). 
Other characters get converted to null bytes. For example:

    >>> kernel32.SetConsoleCP(65001)
    1
    >>> os.read(0, 10)
    ab¡¢£¤cd
    b'ab\x00\x00\x00\x00cd\r\n'

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42707>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to