On 09/09/2016 05:17 PM, Martin Sebor wrote:
Both styles are ambiguous, but isn't that an inherent problem once we
try to avoid non-printable characters by rendering them as octal or hex
On 09/09/2016 07:59 AM, Joseph Myers wrote:
On Thu, 8 Sep 2016, Martin Sebor wrote:
PS I used hexadecimal based on what c-format.c does but now that
I checked more carefully how %qE formats string literals I see it
uses octal. I think hexadecimal is preferable because it avoids
ambiguity but I'm open to changing it to octal if there's a strong
I'm not clear what you mean about ambiguity. In C strings, an octal
escape sequence has up to three characters, so if it has three characters
it's unambiguous, whereas a hex escape sequence can have any number of
characters, so if the unprintable character is followed by a valid hex
digit then in C you need to represent that as an escape (or use string
constant concatenation, etc.). The patch doesn't try to do that as
I can see.
Now, presumably the output isn't intended to be interpreted as C strings
anyway (if it was, you'd need to escape " and \ as well), so the patch is
OK, but I don't think it avoids ambiguity (and there's a clear case that
it shouldn't - that if the string passed to %qs is printable, it
printed as-is even if it contains escape sequences that could also result
from a non-printable string passed to %qs).
I tried to be clear about it in the description of the changes
but I see the PS caused some confusion. Let me clarify that
the patch has nothing to do with with ambiguity (perceived or
real) in the representation of the escape sequences. The only
purpose of the change is to avoid printing non-printable
characters or excessively large escape sequences in GCC
I mentioned the hex vs octal notation to invite input into which
of the two of them people would prefer to see used by the %qc and
qs directives, and whether it's worth considering changing the %qE
directive to use the same notation as well, for consistency (and
to help with readability if there is consensus that one is clearer
than the other).
What I meant by ambiguity is for example a string like "\1234"
where it's not obvious where the octal sequence ends. Is it '\1'
followed by "234" or '\12' followed by "34" or '\123' followed
by "4"? (It's only possible to tell if one knows that GCC always
uses three digits for the octal character, but not everyone knows
that.) To be clear: I'm talking about the GCC output and not
necessarily about what the standard has to say about it.
In contrast to the octal notation, I find the string "\x1234"
clearer. It can only mean '\x1' followed by "234" or '\x12'
followed by "34" and I think more people will expect it to be
the latter because representing characters using two hex digits
is more common. But this is just my own perception and YMMV.
I can't make a strong argument for either style over the other.