On Mon, 09 Jan 2012 16:18:04 -0500 Mark H Weaver <[email protected]> wrote: > Mike Gran <[email protected]> writes: > > scm_from_locale_symbol ("scheme")); > > Note that it's good practice to always use `scm_from_utf8_symbol' or > `scm_from_latin1_symbol' when the argument is a C string literal. The > choice of which (`utf8' or `latin1') depends on the encoding of your C > source file.
Unless guile does something clever, I think it would depend on the encoding of the narrow character execution character set, which may not be the same as the source character set (§5.2.1/1 and 5.2.1.2/1 of C11). The execution character set (the encoding appearing in the binary) is implementation defined according to C99/11. If using gcc, http://gcc.gnu.org/onlinedocs/cpp/Character-sets.html suggests you should be OK in assuming UTF-8 as the default for the encoding of the narrow character execution character set, provided that -finput-charset is set to the correct input file encoding. You can use the -fexec-charset compiler flag to put something else in the binary though. The C standard refers to narrow and wide source character sets and narrow and wide execution character sets. gcc takes it a bit further and first converts the encoding of the input files passed to it into its own notion of the source character set. One curiosity is that if the input charset is not specified via -finput-charset, gcc appears to try to obtain the locale character set to perform this conversion: "-finput-charset=charset: Set the input character set, used for translation from the character set of the input file to the source character set used by GCC. If the locale does not specify, or GCC cannot get this information from the locale, the default is UTF-8. This can be overridden by either the locale or this command line option. Currently the command line option takes precedence if there's a conflict. charset can be any encoding supported by the system's iconv library routine." This means that with gcc source code may not be portable in the absence of -finput-charset being passed to the compiler. I avoid this by always using ASCII (ie English) for string literals in source files and obtaining translated text from gettext(), which deals with the conversion programatically and therefore portably. The overarching point is that, as you say, it would be wrong to assume the execution character set bears any relation to the locale encoding of a particular user on a particular machine. C++ works similarly (§2.2/5 of C++11). We are not concerned with windows here, but if we were, I believe visual studio uses Windows ANSI as the narrow character execution character set in C and C++. Chris
