Re: [PATCH] Whitespace issues during preprocessing

Harald van Dijk Wed, 05 Feb 2014 14:40:13 -0800

On 04/02/14 22:24, Richard Smith wrote:
> On Tue, Feb 4, 2014 at 1:13 PM, Harald van Dijk <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     On 04/02/14 20:25, Justin Bogner wrote:
>     > Harald van Dijk <[email protected] <mailto:[email protected]>>
>     writes:
>     >> Attached are my updated patches intended to fix various whitespace
>     >> issues in clang, modified to take Justin Bogner's comments into
>     account.
>     >> They are intended to ensure that whitespace is not inappropriately
>     >> removed just because a macro or macro argument expansion is
>     empty, and
>     >> does get removed during token pasting.
>     >
>     > I've committed the first four patches for you: r200785 through
>     r200788.
> 
>     Thanks!
> 
>     >> I have moved the handling of invalid token pastes from
>     >> TokenLexer::ExpandFunctionArguments to TokenLexer::Lex, so that
>     it works
>     >> for both object- and function-like macros, and both when ##'s
>     operands
>     >> use macro parameters and when they don't.
>     >
>     > Given that the tests needed to be changed and the behaviour clearly
>     > hasn't followed the comment in a while, I'm not entirely convinced
>     this
>     > is the right thing to do. Could the comment simply be wrong? Are
>     people
>     > relying on one behaviour or the other for invalid token pastes in
>     > assembler-with-cpp?
>     >
>     > Basically, is there a way to objectively say one of these
>     behaviours is
>     > better than the other?
> 
>     Having looked closer, the current behaviour is inconsistent in a way
>     that cannot really be explained to someone not familiar with a few
>     implementation details. Given
> 
>     #define foo(x) (. ## x . ## y)
> 
>     foo(y) actually expands to (.y . y) in assembler-with-cpp mode, with or
>     without my first four patches. That first result is what the comment
>     refers to; could the fact that the second result is different be an
>     oversight? There does not seem to be a reason for this difference, at
>     least. Surprisingly though, this is exactly what GCC does too.
> 
>     My last patch would have turned this into (.y .y), removing the second
>     space. That makes sense to me. (. y . y) could also be a perfectly
>     sensible result.
> 
> 
> I think (.y .y) is probably the best answer here (consistently remove
> whitespace on both sides of ## wherever it appears) -- this also
> probably better matches what MSVC does (where token-paste can result in
> multiple tokens, and seems to act as if the tokens were merely abutted
> with no adjacent whitespace). This would also behave more consistently
> when processing languages with a different lexical structure from C --
> in a language where an identifier can start with a period, (.y .y) seems
> to unambiguously be the right answer.


That is a good point about MSVC. There is actually one test case
affected by this change, using -fms-extensions, where the behaviour did
not match that of MSVC, and does now, at least for the compiler that
comes with Visual Studio 2013.

>     I suspect, but do not actually know, that nobody really uses . ## foo
>     unless foo is a macro argument, because when foo is not a macro
>     argument, there is no point in using ## in the first place. It would
>     explain why this went unnoticed for so long. And I can imagine a few
>     cases where this would be useful: assembler directives that have subtly
>     different syntax, depending on the assembler used. x y would be
>     insufficient if x is ., if y is size, and if .size is a directive, but .
>     size is a syntax error. ## would work here.
> 
>     Even if clang's behaviour should change, though, my patch does not have
>     adequate testing for these special cases, so shouldn't be applied as is.
>     Should I work on better tests, or do you think it is more likely that
>     the behaviour should remain unchanged and get decent testing?
> 
> 
> There's a risk of breaking someone's assumption by making a change here
> (especially since we'd be introducing a difference from GCC) but I think
> the new behavior is much more defensible, and there's probably no way to
> find out without trying it.

All right. I have now taken a closer look at the test cases where the
behaviour changes, and noticed that for blargh ## $foo -> blargh $foo it
would not be right to simply change the test case to blargh$foo, as that
misses the point of the test.

There was also already a test for . ## foo, but it only tested the case
where foo is a macro argument. I have extended that test to also check
what happens when foo is not a macro argument.

How does the attached patch look? I've re-tested it on sources of today,
but on top of my patch in the "r200787 - Fix whitespace handling in
empty macro expansions" thread (after your okay for that).

Cheers,
Harald van Dijk

Fix handling of whitespace in invalid token pastes

A comment in TokenLexer shows that the intent is that x ## y is always
permitted in assembler-with-cpp mode, even if the paste does not result
in a valid token, and that any whitespace before ## or before y is
ignored in that case.

The first two existing tweaked tests in assembler-with-cpp.c do not care
about whitespace, and are updated to reflect the new behaviour. The
third does: it checks that blargh$foo is not an identifier by
concatenating blargh and $foo, and checking whether a space is present
in the output. Since this no longer works, an alternative means of
checking this is required: if blargh$foo is an identifier, and blargh is
a macro, then it does not get expanded. The fourth test is exactly what
this change is about, and gets an extra test for what used to fail.

The preprocessed output for macro_paste_identifier_error.c, which
specifically tests -fms-extensions, now matches that of MSVC.
---
 lib/Lex/TokenLexer.cpp                           | 14 +++++++-------
 test/Preprocessor/assembler-with-cpp.c           | 12 +++++++-----
 test/Preprocessor/macro_paste_identifier_error.c |  2 +-
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/lib/Lex/TokenLexer.cpp b/lib/Lex/TokenLexer.cpp
index 543ca92..9598ddd 100644
--- a/lib/Lex/TokenLexer.cpp
+++ b/lib/Lex/TokenLexer.cpp
@@ -346,13 +346,6 @@ void TokenLexer::ExpandFunctionArguments() {
       // If this token (the macro argument) was supposed to get leading
       // whitespace, transfer this information onto the first token of the
       // expansion.
-      //
-      // Do not do this if the paste operator occurs before the macro argument,
-      // as in "A ## MACROARG".  In valid code, the first token will get
-      // smooshed onto the preceding one anyway (forming AMACROARG).  In
-      // assembler-with-cpp mode, invalid pastes are allowed through: in this
-      // case, we do not want the extra whitespace to be added.  For example,
-      // we want ". ## foo" -> ".foo" not ". foo".
       if (NextTokGetsSpace)
         ResultToks[ResultToks.size()-NumToks].setFlag(Token::LeadingSpace);
 
@@ -434,6 +427,13 @@ bool TokenLexer::Lex(Token &Tok) {
 
   bool TokenIsFromPaste = false;
 
+  // If this token follows a token paste (##) operator, we have an invalid
+  // paste.  In assembler-with-cpp mode, invalid pastes are allowed through: in
+  // this case, we do not want the extra whitespace to be added.  For example,
+  // we want ". ## foo" -> ".foo" not ". foo".
+  if (!isFirstToken && Tokens[CurToken - 2].is(tok::hashhash) && Macro)
+    Tok.clearFlag(Token::LeadingSpace);
+
   // If this token is followed by a token paste (##) operator, paste the tokens!
   // Note that ## is a normal token when not expanding a macro.
   if (!isAtEnd() && Tokens[CurToken].is(tok::hashhash) && Macro) {
diff --git a/test/Preprocessor/assembler-with-cpp.c b/test/Preprocessor/assembler-with-cpp.c
index f03cb06..a007d34 100644
--- a/test/Preprocessor/assembler-with-cpp.c
+++ b/test/Preprocessor/assembler-with-cpp.c
@@ -8,7 +8,7 @@
 // Invalid token pasting is ok. 
 #define A X ## .
 1: A
-// CHECK-Identifiers-False: 1: X .
+// CHECK-Identifiers-False: 1: X.
 
 // Line markers are not linemarkers in .S files, they are passed through.
 # 321
@@ -42,15 +42,17 @@
 #define M5() M4 ## (
 
 5: M5()
-// CHECK-Identifiers-False: 5: expanded (
+// CHECK-Identifiers-False: 5: expanded(
 
 // rdar://6804322
 #define FOO(name)  name ## $foo
+#define blarg
 6: FOO(blarg)
-// CHECK-Identifiers-False: 6: blarg $foo
+// CHECK-Identifiers-False: 6: $foo
 
 // RUN: %clang_cc1 -x assembler-with-cpp -fdollars-in-identifiers -E %s -o - | FileCheck -check-prefix=CHECK-Identifiers-True -strict-whitespace %s
 #define FOO(name)  name ## $foo
+#define blarg
 7: FOO(blarg)
 // CHECK-Identifiers-True: 7: blarg$foo
 
@@ -63,9 +65,9 @@
 // CHECK-Identifiers-True: 9: T7 "foo"
 
 // Concatenation with period doesn't leave a space
-#define T8(A,B) A ## B
+#define T8(A,B) A ## B A ## T8
 10: T8(.,T8)
-// CHECK-Identifiers-True: 10: .T8
+// CHECK-Identifiers-True: 10: .T8 .T8
 
 // This should not crash.
 #define T11(a) #0
diff --git a/test/Preprocessor/macro_paste_identifier_error.c b/test/Preprocessor/macro_paste_identifier_error.c
index bba3172..038b054 100644
--- a/test/Preprocessor/macro_paste_identifier_error.c
+++ b/test/Preprocessor/macro_paste_identifier_error.c
@@ -5,4 +5,4 @@
 
 #define foo a ## b ## = 0
 int foo;
-// CHECK: int ab = 0;
+// CHECK: int ab= 0;

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Whitespace issues during preprocessing

Reply via email to