Several of us don't want UTF-8 quotation marks in diagnostics in our environment (Jove subshells). We'd like a way to turn them off. We don't think that they are a bad idea but they are bad in our environment.
<https://gcc.gnu.org/gcc-4.0/changes.html> English-language diagnostic messages will now use Unicode quotation marks in UTF-8 locales. (Non-English messages already used the quotes appropriate for the language in previous releases.) If your terminal does not support UTF-8 but you are using a UTF-8 locale (such locales are the default on many GNU/Linux systems) then you should set LC_CTYPE=C in the environment to disable that locale. Programs that parse diagnostics and expect plain ASCII English-language messages should set LC_ALL=C. See Markus Kuhn's explanation of Unicode quotation marks for more information. This suggests that LC_CTYPE=C would do what we want: go back to ` and ' instead of 342\200\230 and \342\200\231. I find that a little confusing and scary. I would expect that setting LC_CTYPE=C would have the affect of changing the lexing done by the C compiler. For one thing, valid characters in strings would be different. This we don't want. gcc(1) says: The LC_CTYPE environment variable specifies character classification. GCC uses it to determine the character boundaries in a string; this is needed for some multibyte encodings that contain quote and escape characters that are otherwise interpreted as a string end or escape. The LC_MESSAGES environment variable specifies the language to use in diagnostic messages. An experiment on my Fedora 20 system shows: - LANG=en_CA.UTF-8 [correct] - LC_CTYPE isn't set by default - setting LC_CTYPE to C gets rid of the UTF-8 quotes in GCC diagnostics. That's surprising because the manpage doesn't say that it affects diagnostics. - setting LC_MESSAGES to C DOES NOT get rid of the UTF-8 quotes in GCC diagnostics That's surprising because the manpage does say that it affects diagnostics. I hope that it only affect compile-time diagnostics. That sure sounds like we should NOT set LC_CTYPE=C because of bad side-effects: it changes how the program is lexed. And the documentation gives no basis for thinking that it would suppress those UTF-8 quotes in messages (even though testing shows that this works). That sure sounds like we should set LC_MESSAGES=C, but that doesn't work. In our environment, our tool doesn't know that gcc is being invoked. So the solution needs to be targetted. That's why a solution like GCC_COLOURS would be good. In fact, it could probably be hacked into GCC_COLOURS. Man pages in section 1 that explicitly reference LC_CTYPE: enca enconv find gcc gnroff grep jove koi8rxterm less locale localedef nroff perl5004delta perl5160delta perl58delta perlfunc perllocale perltoc pico pilot sh systemd time tree uxterm xterm So I feel uncomfortable setting it. Man pages in section 1 that explicitly reference LC_MESSAGES: apropos aspell awk bash enca enconv find gawk gcc grep hunspell install-tl locale localectl localedef lynx man nmcli perllocale perltoc sh systemd systemd-firstboot time whatis xdg-desktop-icon xdg-desktop-menu So setting this would hardly be safer.