On 06/02/14 00:48, Harald van Dijk wrote: > On 06/02/14 00:13, Richard Smith wrote: >> On Wed, Feb 5, 2014 at 2:36 PM, Harald van Dijk <[email protected] >> <mailto:[email protected]>> wrote: >> >> On 04/02/14 22:24, Richard Smith wrote: >> > On Tue, Feb 4, 2014 at 1:13 PM, Harald van Dijk >> <[email protected] <mailto:[email protected]> >> > <mailto:[email protected] <mailto:[email protected]>>> wrote: >> > >> > On 04/02/14 20:25, Justin Bogner wrote: >> > > Harald van Dijk <[email protected] >> <mailto:[email protected]> <mailto:[email protected] >> <mailto:[email protected]>>> >> > writes: >> > >> Attached are my updated patches intended to fix various >> whitespace >> > >> issues in clang, modified to take Justin Bogner's comments into >> > account. >> > >> They are intended to ensure that whitespace is not >> inappropriately >> > >> removed just because a macro or macro argument expansion is >> > empty, and >> > >> does get removed during token pasting. >> > > >> > > I've committed the first four patches for you: r200785 through >> > r200788. >> > >> > Thanks! >> > >> > >> I have moved the handling of invalid token pastes from >> > >> TokenLexer::ExpandFunctionArguments to TokenLexer::Lex, so that >> > it works >> > >> for both object- and function-like macros, and both when ##'s >> > operands >> > >> use macro parameters and when they don't. >> > > >> > > Given that the tests needed to be changed and the behaviour >> clearly >> > > hasn't followed the comment in a while, I'm not entirely >> convinced >> > this >> > > is the right thing to do. Could the comment simply be wrong? Are >> > people >> > > relying on one behaviour or the other for invalid token >> pastes in >> > > assembler-with-cpp? >> > > >> > > Basically, is there a way to objectively say one of these >> > behaviours is >> > > better than the other? >> > >> > Having looked closer, the current behaviour is inconsistent in >> a way >> > that cannot really be explained to someone not familiar with a few >> > implementation details. Given >> > >> > #define foo(x) (. ## x . ## y) >> > >> > foo(y) actually expands to (.y . y) in assembler-with-cpp >> mode, with or >> > without my first four patches. That first result is what the >> comment >> > refers to; could the fact that the second result is different >> be an >> > oversight? There does not seem to be a reason for this >> difference, at >> > least. Surprisingly though, this is exactly what GCC does too. >> > >> > My last patch would have turned this into (.y .y), removing >> the second >> > space. That makes sense to me. (. y . y) could also be a perfectly >> > sensible result. >> > >> > >> > I think (.y .y) is probably the best answer here (consistently remove >> > whitespace on both sides of ## wherever it appears) -- this also >> > probably better matches what MSVC does (where token-paste can >> result in >> > multiple tokens, and seems to act as if the tokens were merely abutted >> > with no adjacent whitespace). This would also behave more consistently >> > when processing languages with a different lexical structure from C -- >> > in a language where an identifier can start with a period, (.y .y) >> seems >> > to unambiguously be the right answer. >> >> That is a good point about MSVC. There is actually one test case >> affected by this change, using -fms-extensions, where the behaviour did >> not match that of MSVC, and does now, at least for the compiler that >> comes with Visual Studio 2013. >> >> > I suspect, but do not actually know, that nobody really uses . >> ## foo >> > unless foo is a macro argument, because when foo is not a macro >> > argument, there is no point in using ## in the first place. It >> would >> > explain why this went unnoticed for so long. And I can imagine >> a few >> > cases where this would be useful: assembler directives that >> have subtly >> > different syntax, depending on the assembler used. x y would be >> > insufficient if x is ., if y is size, and if .size is a >> directive, but . >> > size is a syntax error. ## would work here. >> > >> > Even if clang's behaviour should change, though, my patch does >> not have >> > adequate testing for these special cases, so shouldn't be >> applied as is. >> > Should I work on better tests, or do you think it is more >> likely that >> > the behaviour should remain unchanged and get decent testing? >> > >> > >> > There's a risk of breaking someone's assumption by making a change >> here >> > (especially since we'd be introducing a difference from GCC) but I >> think >> > the new behavior is much more defensible, and there's probably no >> way to >> > find out without trying it. >> >> All right. I have now taken a closer look at the test cases where the >> behaviour changes, and noticed that for blargh ## $foo -> blargh $foo it >> would not be right to simply change the test case to blargh$foo, as that >> misses the point of the test. >> >> >> Right, I see, it's just making sure that -fdollars-in-identifiers allows >> the paste to work. I like your fix to the test here. > > Thanks :) > >> There was also already a test for . ## foo, but it only tested the case >> where foo is a macro argument. I have extended that test to also check >> what happens when foo is not a macro argument. >> >> >> Can you also add a test for the case where the . comes from a macro >> argument? May as well be thorough =) > > In both tests, the . comes from a macro argument, so I think you mean > add cases where it doesn't come from a macro argument? Sure. > >> How does the attached patch look? I've re-tested it on sources of today, >> but on top of my patch in the "r200787 - Fix whitespace handling in >> empty macro expansions" thread (after your okay for that). >> >> >> Checking whether the previous token was a ## worries me a little. What >> about cases like this: >> >> #define foo(x, y) x####y >> foo(, z) >> >> This expands to "## z" under our current left-to-right pasting strategy, >> but I think your patch drops the space. > > In that test, I think dropping the space is correct. Leading and > trailing white space in a macro argument is insignificant, which is why > > #define ID(x) x > #define FOO(x,y) ID(x)ID(y) > FOO( [ , ] ) > > should (and does) expand to > > [] > > Since there is no space between the second ## and the y, there should be > no space in the output. That said, I do see your point. If I change your > test to > > #define foo(x) x#### y > foo() > > then (still assuming left-to-right, otherwise the test won't work at > all) the output should be > > ## y > > and I am now getting > > ##y > >> Amusingly, "##z" appears to be >> the right answer under a right-to-left pasting strategy, so maybe that's >> OK? Both GCC and EDG expand this to just "z"; the standard is not >> spectacularly clear here, but I think we're right, because a ## produced >> by pasting a placemarker token onto a ## is not a token in the >> replacement list (that token is already gone). >> >> If you want to say that we don't care about this case, I could live with >> that =) > > The preprocessor is supposed to change ## from tok::hashhash to > tok::unknown in those cases during macro expansion where it is not a > paste, to avoid this and similar problems. For example, this altered > test does behave as desired: > > #define hash # > #define concat_(x, y) x ## y > #define concat(x, y) concat_(x, y) > #define bar() concat(hash, hash) y > bar() > > The preprocessor output with my patch is ## y, even though the y follows ##. > > This test also works: > > #define foo(x) x y > foo(##)
Ensuring that any remaining ## token is always tok::unknown turned out to be hugely impractical, in light of the fact that invalid token pastes may also occur in object-like macro expansions, where the RHS can equally be a ## token, and there is no ExpandFunctionArgs() or equivalent where the tokens can be checked before the actual expansion. Without an additional pass before expansion, checking whether a token is the RHS of an invalid paste becomes too complicated to be worth the effort, and remembering invalid pastes instead is easily done. That said... > I think the reason it does not happen for x#### y is because when the > LHS of a token paste is an empty macro argument, the RHS is copied > without actually performing any paste, and in that case, the change to > tok::unknown is skipped. It's tempting to say that we don't care, but I > think this affects more than a single space. I will look into this. ...it does affect more than a single space, and it exposes some interesting corner cases. #define foo(x,y) x ## ## y y foo(,) No matter which ## is interpreted first, the result should be ##. For purposes of special handling of empty argument pastes, the first appearance of y should not be treated as following a paste operator. At the same time, though, for purposes of pre-expansion of macro arguments, it should be: in foo(,foo(,)) (where it now matters that the scan for ## searches from left to right) the first y follows a ## token, so should not be pre-expanded. The second y does not, so should be. Normally, this would not matter, except when the later expansion would give different results, as it does here: later expansion occurs in the context of the expansion of foo, so the first nested foo is not expanded. The second is, because the argument was pre-expanded, so the result should be ## foo(,) ## I've only been able to get this result by tracking HashHashBefore and PasteBefore independently, where HashHashBefore indicates whether the token follows a ## token, and PasteBefore indicates whether that ## token was interpreted as an operator. Note that if clang actually used placemarker tokens, this would already be the natural result without special effort, but I understand that that may be impractical for other reasons. If I then take an extra look at my attempt to get whitespace handled consistently in invalid token pastes, I end up with a similar extra variable to track whether an attempted token paste has taken place, instead of checking tok::hashhash, just like you suggested. However, another problem is that when slightly modifying the example, to #define foo(x,y) w x ## ## y z foo(,) argument substitution should give w ## z, but the actual expansion should then not see the ## as a token paste operation. So even though TokenLexer::Lex should not check for tok::hashhash, the fact that ## is not changed to tok::unknown here is still a problem. I have almost-working patches that I hope to finish over the weekend. Cheers, Harald van Dijk _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
