Re: [PATCH] RFC: On-demand locations within string-literals

David Malcolm Wed, 20 Jul 2016 12:38:40 -0700

On Fri, 2016-07-08 at 17:49 -0400, David Malcolm wrote:
[...]

> Also, this patch currently makes the assumption (in charset.c)
> that there's a 1:1 correspondence between bytes in the source
> character set and bytes in the execution character set.  This can
> be the case if both are, say, UTF-8, but might not hold in
> general.
> 
> The source char set is UTF-8 or UTF-EBCDIC, and safe-ctype.c has:
> 
> # if HOST_CHARSET == HOST_CHARSET_EBCDIC
>   #error "FIXME: write tables for EBCDIC"
> 
> so presumably we don't actually have any hosts that supports EBCDIC
> (do we?); as far as I can tell, we only currently support UTF-8
> as the source char set.
> 
> Similarly, do we support any targets for which the execution
> character set is *not* UTF-8?


I brought this up in this thread on the gcc mailing list:
"gcc/libcpp: non-UTF-8 source or execution encodings?"
  https://gcc.gnu.org/ml/gcc/2016-07/msg00091.html
and in particular:
  https://gcc.gnu.org/ml/gcc/2016-07/msg00106.html
it's possible to select the execution char set using at the command
-line for C-family frontends using:
  -fexec-charset=
  -fwide-exec-charset=
e.g. "-fexec-charset=IBM1047" will give one of the variants of EBCDIC.

Given that the internal interface already has a failure mode, I'm
thinking that a reasonable restriction is to only support locations
within string literals for the case where source character set ==
execution character set, and hence we have "convert_no_conversion" as
the converter.  Does that sound sane?  (I can write test coverage for
this).

[...]

Re: [PATCH] RFC: On-demand locations within string-literals

Reply via email to