Re: gh_repl

Chris Vine Tue, 10 Jan 2012 12:35:13 -0800

On Mon, 09 Jan 2012 16:18:04 -0500
Mark H Weaver <[email protected]> wrote:
> Mike Gran <[email protected]> writes:
> >    scm_from_locale_symbol ("scheme"));
> 
> Note that it's good practice to always use `scm_from_utf8_symbol' or
> `scm_from_latin1_symbol' when the argument is a C string literal.  The
> choice of which (`utf8' or `latin1') depends on the encoding of your C
> source file.


Unless guile does something clever, I think it would depend on the
encoding of the narrow character execution character set, which may not
be the same as the source character set (§5.2.1/1 and 5.2.1.2/1 of C11).

The execution character set (the encoding appearing in the binary) is
implementation defined according to C99/11.  If using gcc,
http://gcc.gnu.org/onlinedocs/cpp/Character-sets.html
suggests you should be OK in assuming UTF-8 as the default for the
encoding of the narrow character execution character set, provided that
-finput-charset is set to the correct input file encoding.  You can use
the -fexec-charset compiler flag to put something else in the binary
though.

The C standard refers to narrow and wide source character sets and
narrow and wide execution character sets.  gcc takes it a bit further
and first converts the encoding of the input files passed to it into its
own notion of the source character set. One curiosity is that if the
input charset is not specified via -finput-charset, gcc appears to try
to obtain the locale character set to perform this conversion:

  "-finput-charset=charset:  Set the input character set, used for
  translation from the character set of the input file to the source
  character set used by GCC. If the locale does not specify, or GCC
  cannot get this information from the locale, the default is UTF-8.
  This can be overridden by either the locale or this command line
  option. Currently the command line option takes precedence if there's
  a conflict. charset can be any encoding supported by the system's
  iconv library routine."

This means that with gcc source code may not be portable in the absence
of -finput-charset being passed to the compiler.  I avoid this by always
using ASCII (ie English) for string literals in source files and
obtaining translated text from gettext(), which deals with the
conversion programatically and therefore portably.

The overarching point is that, as you say, it would be wrong to assume
the execution character set bears any relation to the locale encoding
of a particular user on a particular machine.  C++ works similarly
(§2.2/5 of C++11).

We are not concerned with windows here, but if we were, I believe visual
studio uses Windows ANSI as the narrow character execution character
set in C and C++.

Chris

Re: gh_repl

Reply via email to