On 06/02/14 00:13, Richard Smith wrote: > On Wed, Feb 5, 2014 at 2:36 PM, Harald van Dijk <[email protected] > <mailto:[email protected]>> wrote: > > On 04/02/14 22:24, Richard Smith wrote: > > On Tue, Feb 4, 2014 at 1:13 PM, Harald van Dijk > <[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > > On 04/02/14 20:25, Justin Bogner wrote: > > > Harald van Dijk <[email protected] > <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>>> > > writes: > > >> Attached are my updated patches intended to fix various > whitespace > > >> issues in clang, modified to take Justin Bogner's comments into > > account. > > >> They are intended to ensure that whitespace is not > inappropriately > > >> removed just because a macro or macro argument expansion is > > empty, and > > >> does get removed during token pasting. > > > > > > I've committed the first four patches for you: r200785 through > > r200788. > > > > Thanks! > > > > >> I have moved the handling of invalid token pastes from > > >> TokenLexer::ExpandFunctionArguments to TokenLexer::Lex, so that > > it works > > >> for both object- and function-like macros, and both when ##'s > > operands > > >> use macro parameters and when they don't. > > > > > > Given that the tests needed to be changed and the behaviour > clearly > > > hasn't followed the comment in a while, I'm not entirely > convinced > > this > > > is the right thing to do. Could the comment simply be wrong? Are > > people > > > relying on one behaviour or the other for invalid token > pastes in > > > assembler-with-cpp? > > > > > > Basically, is there a way to objectively say one of these > > behaviours is > > > better than the other? > > > > Having looked closer, the current behaviour is inconsistent in > a way > > that cannot really be explained to someone not familiar with a few > > implementation details. Given > > > > #define foo(x) (. ## x . ## y) > > > > foo(y) actually expands to (.y . y) in assembler-with-cpp > mode, with or > > without my first four patches. That first result is what the > comment > > refers to; could the fact that the second result is different > be an > > oversight? There does not seem to be a reason for this > difference, at > > least. Surprisingly though, this is exactly what GCC does too. > > > > My last patch would have turned this into (.y .y), removing > the second > > space. That makes sense to me. (. y . y) could also be a perfectly > > sensible result. > > > > > > I think (.y .y) is probably the best answer here (consistently remove > > whitespace on both sides of ## wherever it appears) -- this also > > probably better matches what MSVC does (where token-paste can > result in > > multiple tokens, and seems to act as if the tokens were merely abutted > > with no adjacent whitespace). This would also behave more consistently > > when processing languages with a different lexical structure from C -- > > in a language where an identifier can start with a period, (.y .y) > seems > > to unambiguously be the right answer. > > That is a good point about MSVC. There is actually one test case > affected by this change, using -fms-extensions, where the behaviour did > not match that of MSVC, and does now, at least for the compiler that > comes with Visual Studio 2013. > > > I suspect, but do not actually know, that nobody really uses . > ## foo > > unless foo is a macro argument, because when foo is not a macro > > argument, there is no point in using ## in the first place. It > would > > explain why this went unnoticed for so long. And I can imagine > a few > > cases where this would be useful: assembler directives that > have subtly > > different syntax, depending on the assembler used. x y would be > > insufficient if x is ., if y is size, and if .size is a > directive, but . > > size is a syntax error. ## would work here. > > > > Even if clang's behaviour should change, though, my patch does > not have > > adequate testing for these special cases, so shouldn't be > applied as is. > > Should I work on better tests, or do you think it is more > likely that > > the behaviour should remain unchanged and get decent testing? > > > > > > There's a risk of breaking someone's assumption by making a change > here > > (especially since we'd be introducing a difference from GCC) but I > think > > the new behavior is much more defensible, and there's probably no > way to > > find out without trying it. > > All right. I have now taken a closer look at the test cases where the > behaviour changes, and noticed that for blargh ## $foo -> blargh $foo it > would not be right to simply change the test case to blargh$foo, as that > misses the point of the test. > > > Right, I see, it's just making sure that -fdollars-in-identifiers allows > the paste to work. I like your fix to the test here.
Thanks :) > There was also already a test for . ## foo, but it only tested the case > where foo is a macro argument. I have extended that test to also check > what happens when foo is not a macro argument. > > > Can you also add a test for the case where the . comes from a macro > argument? May as well be thorough =) In both tests, the . comes from a macro argument, so I think you mean add cases where it doesn't come from a macro argument? Sure. > How does the attached patch look? I've re-tested it on sources of today, > but on top of my patch in the "r200787 - Fix whitespace handling in > empty macro expansions" thread (after your okay for that). > > > Checking whether the previous token was a ## worries me a little. What > about cases like this: > > #define foo(x, y) x####y > foo(, z) > > This expands to "## z" under our current left-to-right pasting strategy, > but I think your patch drops the space. In that test, I think dropping the space is correct. Leading and trailing white space in a macro argument is insignificant, which is why #define ID(x) x #define FOO(x,y) ID(x)ID(y) FOO( [ , ] ) should (and does) expand to [] Since there is no space between the second ## and the y, there should be no space in the output. That said, I do see your point. If I change your test to #define foo(x) x#### y foo() then (still assuming left-to-right, otherwise the test won't work at all) the output should be ## y and I am now getting ##y > Amusingly, "##z" appears to be > the right answer under a right-to-left pasting strategy, so maybe that's > OK? Both GCC and EDG expand this to just "z"; the standard is not > spectacularly clear here, but I think we're right, because a ## produced > by pasting a placemarker token onto a ## is not a token in the > replacement list (that token is already gone). > > If you want to say that we don't care about this case, I could live with > that =) The preprocessor is supposed to change ## from tok::hashhash to tok::unknown in those cases during macro expansion where it is not a paste, to avoid this and similar problems. For example, this altered test does behave as desired: #define hash # #define concat_(x, y) x ## y #define concat(x, y) concat_(x, y) #define bar() concat(hash, hash) y bar() The preprocessor output with my patch is ## y, even though the y follows ##. This test also works: #define foo(x) x y foo(##) I think the reason it does not happen for x#### y is because when the LHS of a token paste is an empty macro argument, the RHS is copied without actually performing any paste, and in that case, the change to tok::unknown is skipped. It's tempting to say that we don't care, but I think this affects more than a single space. I will look into this. Cheers, Harald van Dijk _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
