On Wed, Feb 10, 2016 at 2:30 PM, Andrew Barnert via Python-Dev
<python-dev@python.org> wrote:
>   [^3]: Say you write a program that assumes it will only be run on Shift-JIS 
> systems, and you use
> CreateFileA to create a file named "ハローワールド". The actual bytes you're sending 
> are cp436
> for "ânâìü[âÅü[âïâh", so the file on the CD is named, in Unicode, 
> "ânâìü[âÅü[âïâh".

Unless the system default was changed or the program called
SetFileApisToOEM, CreateFileA would decode using the ANSI codepage
1252, not the OEM codepage 437 (not 436), i.e.
"ƒnƒ\x8d\x81[ƒ\x8f\x81[ƒ‹ƒh". Otherwise the example is right. But the
transcoding strategy won't work in general. For example, if the tables
are turned such that the ANSI codepage is 932 and the program passes a
bytes name from codepage 1252, the user on the other end won't be able
to transcode without error if the original bytes contained invalid
DBCS sequences that were mapped to the default character, U+30FB. This
transcodes as the meaningless string "\x81E". The user can replace
that string with "--" and enjoy a nice game of hang man.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to