Thomas Dickey <dickey <at> his.com> writes: > > This means that characters 0..127 have to be treated as ASCII, but
No, it means that portable characters and control characters must be < 128. ASCII meets this characteristic, but so does EBCDIC, as well as UTF-8. The C locale also implies that you can manipulate bytes >= 128 in the naive manner, so long as you don't care about characters embedded in those bytes. And what do you know - ASCII, EBCDIC, and UTF-8 all meet this property, too. > > beyond that an implementation can do what it wants. And on Cygwin 1.7, > > plain "C" actually does imply UTF-8, which happily is > > backward-compatible with ASCII. > > That's an interpretation that so far hasn't been blessed by the standards > people. Any discussion of this topic should mention that, as a caveat. Actually, the standards people HAVE spoken - and they agreed with our interpretation. POSIX was INTENTIONALLY written with the intent that a UTF-8 encoding is valid for the C locale, for the same reason that it was written that an EBCDIC encoding is valid for the C locale. These emails from the Austin Group (the folks that write POSIX) are telling: https://www.opengroup.org/sophocles/show_mail.tpl? CALLER=show_archive.tpl&source=L&listname=austin-group-l&id=12982 https://www.opengroup.org/sophocles/show_mail.tpl? CALLER=show_archive.tpl&source=L&listname=austin-group-l&id=13012 But they also admitted that there is still more work needed in POSIX to make this intent clearly codified (for example, that control characters must be single bytes < 128). -- Eric Blake -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://x.cygwin.com/docs/ FAQ: http://x.cygwin.com/docs/faq/