Hi! I'd like to ping this patch. As I wrote, it isn't a full solution for all the __VA_OPT__ issues, but it isn't even clear to me how exactly it should behave, but fixes some ICEs and a couple of most important issues and shouldn't make things worse, at least on the gcc and clang __VA_OPT__ testcases.
On Wed, Jan 10, 2018 at 01:04:07PM +0100, Jakub Jelinek wrote: > In libcpp, we have quite a lot of state on the token flags, some > related to the stuff that comes before the token (e.g. > PREV_FALLTHROUGH, PREV_WHITE and STRINGIFY_ARG), others related to the > stuff that comes after the token (e.g. PASTE_LEFT, SP_DIGRAPH, SP_PREV_WHITE). > Unfortunately, with the current __VA_OPT__ handling that information is > lost, because it is on the __VA_OPT__ or closing ) tokens that we are always > DROPing. > > The following patch attempts to fix various issues, including some ICEs, > by introducing 3 new states, two of them are alternatives to INCLUDE used > for the very first token after __VA_OPT__( , where we want to take into > account also flags from the __VA_OPT__ token, and before the closing ) > token where we want to use flags from the closing ) token. Plus > PADDING which is used for the case where there are no varargs and __VA_OPT__ > is supposed to fold into a placemarker, or for the case of __VA_OPT__(), > which is similar to that, in both cases we need to take into account in > those cases both flags from __VA_OPT__ and from the closing ). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > This is just a partial fix, one thing this patch doesn't change is that > the standard says that __VA_OPT__ ( contents ) should be treated as > parameter, which means that #__VA_OPT__ ( contents ) should stringify it, > which we right now reject. My preprocessor knowledge is too limited to > handle this right myself, including all the corner cases, e.g. one can have > #define f(x, ...) #__VA_OPT__(#x x ## x) etc.. I presume > m_flags = token->flags & (PREV_FALLTHROUGH | PREV_WHITE); > could be changed into: > m_flags = token->flags & (PREV_FALLTHROUGH | PREV_WHITE | STRINGIFY_ARG); > and when handling the PADDING result from update, we could just emit a > "" token, but for INCLUDE_FIRST with this we'd need something complex, > probably a new routine similar to stringify_arg to some extent. > > I've also cross-checked the libcpp implementation with this patch against > trunk clang which apparently also implements __VA_OPT__ now, on the > testcases included here the output is the same and on their > macro_vaopt_expand.cpp testcase, if I remove all tests that test > #__VA_OPT__ ( contents ) handling which we just reject now, there are still > some differences: > $ /usr/src/llvm/obj8/bin/clang++ -E /tmp/macro_vaopt_expand.cpp -std=c++2a > > /tmp/1 > $ ~/src/gcc/obj20/gcc/cc1plus -quiet -E /tmp/macro_vaopt_expand.cpp > -std=c++2a > /tmp/2 > diff -up /tmp/1 /tmp/2 > -4: f(0 ) > +4: f(0) > -6: f(0, a ) > -7: f(0, a ) > +6: f(0, a) > +7: f(0, a) > -9: TONG C ( ) B ( ) "A()" > +9: HT_A() C ( ) B ( ) "A()" > -16: S foo ; > +16: S foo; > -26: B1 > -26_1: B1 > +26: B 1 > +26_1: B 1 > -27: B11 > -27_1: BexpandedA0 11 > -28: B11 > +27: B 11 > +27_1: BA0 11 > +28: B 11 > > Perhaps some of the whitespace changes aren't significant, but 9:, and > 2[678]{,_1}: are significantly different. > 9: is > #define LPAREN ( > #define A() B LPAREN ) > #define B() C LPAREN ) > #define HT_B() TONG > #define F(x, ...) HT_ ## __VA_OPT__(x x A() #x) > 9: F(A(),1) > > Thoughts on what is right and why? > Similarly for expansion on the last token from __VA_OPT__ when followed > by ##, like: > #define m1 ( > #define f16() f17 m1 ) > #define f17() f18 m1 ) > #define f18() m2 m1 ) > #define m3f17() g > #define f19(x, ...) m3 ## __VA_OPT__(x x f16() #x) > #define f20(x, ...) __VA_OPT__(x x)##m4() > #define f21() f17 > #define f17m4() h > t25 f19 (f16 (), 1); > t26 f20 (f21 (), 2); > > E.g. 26: is: > #define F(a,...) __VA_OPT__(B a ## a) ## 1 > 26: F(,1) > I really wonder why clang emits B1 in that case, there > is no ## in between B and a, so those are 2 separate tokens > separated by whitespace, even when a ## a is a placemarker. > Does that mean __VA_OPT__ should throw away all the placemarkers > and return the last non-placemarker token for the ## handling? > > Can somebody please take the rest over? > > BTW, Tom, perhaps you should update your MAINTAINERS entry email address... > > 2018-01-10 Jakub Jelinek <ja...@redhat.com> > > PR preprocessor/83063 > PR preprocessor/83708 > * macro.c (enum macro_arg_token_kind): Fix comment typo. > (vaopt_state): Add m_flags field, reorder m_last_was_paste before > m_state. > (vaopt_state::vaopt_state): Adjust for the above changes. > (vaopt_state::update_flags): Add INCLUDE_FIRST, INCLUDE_LAST and > PADDING. > (vaopt_state::update): Add limit argument, update m_flags member, > return INCLUDE_FIRST instead of INCLUDE on the first and INCLUDE_LAST > instead of INCLUDE on the last token between __VA_OPT__( and ). > Returnn PADDING instead of DROP on the closing ) if no tokens were > INCLUDE*. Comment spelling fixes. > (vaopt_state::flags): New method. > (replace_args): Adjust vaopt_state::update caller. Fix handling > of __VA_OPT__ preceded or followed by ## and ensure no paste after > last __VA_OPT__ token if not followed by ##. Formatting fix. > (create_iso_definition): Adjust vaopt_state::update caller. > > * c-c++-common/cpp/va-opt-2.c: New test. > * c-c++-common/cpp/va-opt-3.c: New test. > > --- libcpp/macro.c.jj 2018-01-03 10:42:55.938763447 +0100 > +++ libcpp/macro.c 2018-01-09 17:27:23.603191796 +0100 > @@ -51,7 +51,7 @@ struct macro_arg > enum macro_arg_token_kind { > MACRO_ARG_TOKEN_NORMAL, > /* This is a macro argument token that got transformed into a string > - litteral, e.g. #foo. */ > + literal, e.g. #foo. */ > MACRO_ARG_TOKEN_STRINGIFIED, > /* This is a token resulting from the expansion of a macro > argument that was itself a macro. */ > @@ -105,10 +105,11 @@ class vaopt_state { > : m_pfile (pfile), > m_allowed (any_args), > m_variadic (is_variadic), > - m_state (0), > m_last_was_paste (false), > + m_state (0), > m_paste_location (0), > - m_location (0) > + m_location (0), > + m_flags (0) > { > } > > @@ -116,13 +117,16 @@ class vaopt_state { > { > ERROR, > DROP, > - INCLUDE > + INCLUDE, > + INCLUDE_FIRST, > + INCLUDE_LAST, > + PADDING > }; > > /* Given a token, update the state of this tracker and return a > boolean indicating whether the token should be be included in the > expansion. */ > - update_type update (const cpp_token *token) > + update_type update (const cpp_token *token, const cpp_token *limit) > { > /* If the macro isn't variadic, just don't bother. */ > if (!m_variadic) > @@ -139,6 +143,7 @@ class vaopt_state { > } > ++m_state; > m_location = token->src_loc; > + m_flags = token->flags & (PREV_FALLTHROUGH | PREV_WHITE); > return DROP; > } > else if (m_state == 1) > @@ -155,6 +160,7 @@ class vaopt_state { > } > else if (m_state >= 2) > { > + update_type ret = INCLUDE; > if (m_state == 2 && token->type == CPP_PASTE) > { > cpp_error_at (m_pfile, CPP_DL_ERROR, token->src_loc, > @@ -165,7 +171,10 @@ class vaopt_state { > case we see a close paren immediately after the open > paren. */ > if (m_state == 2) > - ++m_state; > + { > + ++m_state; > + ret = INCLUDE_FIRST; > + } > > bool was_paste = m_last_was_paste; > m_last_was_paste = false; > @@ -191,10 +200,27 @@ class vaopt_state { > return ERROR; > } > > - return DROP; > + if (ret == INCLUDE_FIRST) > + { > + m_flags |= token->flags & (PASTE_LEFT | SP_DIGRAPH > + | SP_PREV_WHITE); > + return PADDING; > + } > + return m_allowed ? DROP : PADDING; > } > } > - return m_allowed ? INCLUDE : DROP; > + if (limit > + && m_state == 3 > + && token + 1 < limit > + && token[1].type == CPP_CLOSE_PAREN) > + { > + if (m_allowed && ret != INCLUDE_FIRST) > + m_flags = 0; > + m_flags |= token[1].flags & (PASTE_LEFT | SP_DIGRAPH > + | SP_PREV_WHITE); > + ret = INCLUDE_LAST; > + } > + return m_allowed ? ret : DROP; > } > > /* Nothing to do with __VA_OPT__. */ > @@ -211,6 +237,12 @@ class vaopt_state { > return m_state == 0; > } > > + /* Return any flags that should be ored to the current token. */ > + unsigned short flags () > + { > + return m_flags; > + } > + > private: > > /* The cpp_reader. */ > @@ -220,6 +252,9 @@ class vaopt_state { > bool m_allowed; > /* True if the macro is variadic. */ > bool m_variadic; > + /* If true, the previous token was ##. This is used to detect when > + a paste occurs at the end of the sequence. */ > + bool m_last_was_paste; > > /* The state variable: > 0 means not parsing > @@ -228,14 +263,14 @@ class vaopt_state { > >= 3 means looking for ")", the number encodes the paren depth. */ > int m_state; > > - /* If true, the previous token was ##. This is used to detect when > - a paste occurs at the end of the sequence. */ > - bool m_last_was_paste; > /* The location of the paste token. */ > source_location m_paste_location; > > /* Location of the __VA_OPT__ token. */ > source_location m_location; > + > + /* The flags from the __VA_OPT__ token and closing ). */ > + unsigned short m_flags; > }; > > /* Macro expansion. */ > @@ -1817,9 +1852,9 @@ replace_args (cpp_reader *pfile, cpp_has > multiple tokens. This is to save memory at the expense of > accuracy. > > - Suppose we have #define SQARE(A) A * A > + Suppose we have #define SQUARE(A) A * A > > - And then we do SQARE(2+3) > + And then we do SQUARE(2+3) > > Then the tokens 2, +, 3, will have the same location, > saying they come from the expansion of the argument A. */ > @@ -1831,16 +1866,70 @@ replace_args (cpp_reader *pfile, cpp_has > i = 0; > vaopt_state vaopt_tracker (pfile, macro->variadic, > args[macro->paramc - 1].count > 0); > + unsigned short prev_flags = 0; > for (src = macro->exp.tokens; src < limit; src++) > { > unsigned int arg_tokens_count; > macro_arg_token_iter from; > const cpp_token **paste_flag = NULL; > const cpp_token **tmp_token_ptr; > + unsigned short flags = src->flags; > > /* __VA_OPT__ handling. */ > - if (vaopt_tracker.update (src) != vaopt_state::INCLUDE) > - continue; > + vaopt_state::update_type vaopt_state = vaopt_tracker.update (src, > limit); > + if (vaopt_state != vaopt_state::INCLUDE) > + { > + if (vaopt_state == vaopt_state::PADDING) > + { > + flags = vaopt_tracker.flags (); > + if ((flags & PASTE_LEFT) && (prev_flags & PASTE_LEFT)) > + continue; > + flags &= ~PASTE_LEFT; > + cpp_token *t = (cpp_token *) padding_token (pfile, src); > + unsigned index = expanded_token_index (pfile, macro, src, i); > + t->flags |= flags; > + /* Allocate a virtual location for the padding token and > + append the token and its location to BUFF and > + VIRT_LOCS. */ > + tokens_buff_add_token (buff, virt_locs, t, > + t->src_loc, t->src_loc, > + map, index); > + prev_flags = flags; > + continue; > + } > + else if (vaopt_state == vaopt_state::INCLUDE_FIRST > + || vaopt_state == vaopt_state::INCLUDE_LAST) > + { > + unsigned short vaopt_flags = vaopt_tracker.flags (); > + if (src->type != CPP_MACRO_ARG) > + { > + cpp_token *t = _cpp_temp_token (pfile); > + t->type = src->type; > + t->val = src->val; > + t->flags = flags | vaopt_flags; > + unsigned index = expanded_token_index (pfile, macro, src, i); > + tokens_buff_add_token (buff, virt_locs, t, > + t->src_loc, t->src_loc, map, index); > + i += 1; > + prev_flags = t->flags; > + /* Avoid paste on RHS, __VA_OPT__(c)d or > + __VA_OPT__(c)__VA_OPT__(d). */ > + if (!(vaopt_flags & PASTE_LEFT) > + && vaopt_state == vaopt_state::INCLUDE_LAST) > + { > + const cpp_token *t = &pfile->avoid_paste; > + tokens_buff_add_token (buff, virt_locs, > + t, t->src_loc, t->src_loc, > + NULL, 0); > + } > + continue; > + } > + else > + flags |= vaopt_flags; > + } > + else > + continue; > + } > > if (src->type != CPP_MACRO_ARG) > { > @@ -1852,6 +1941,7 @@ replace_args (cpp_reader *pfile, cpp_has > src->src_loc, src->src_loc, > map, index); > i += 1; > + prev_flags = flags; > continue; > } > > @@ -1866,7 +1956,7 @@ replace_args (cpp_reader *pfile, cpp_has > below is to initialize the iterator depending on the type of > tokens the macro argument has. It also does some adjustment > related to padding tokens and some pasting corner cases. */ > - if (src->flags & STRINGIFY_ARG) > + if (flags & STRINGIFY_ARG) > { > arg_tokens_count = 1; > macro_arg_token_iter_init (&from, > @@ -1875,7 +1965,7 @@ replace_args (cpp_reader *pfile, cpp_has > MACRO_ARG_TOKEN_STRINGIFIED, > arg, &arg->stringified); > } > - else if (src->flags & PASTE_LEFT) > + else if (flags & PASTE_LEFT) > { > arg_tokens_count = arg->count; > macro_arg_token_iter_init (&from, > @@ -1884,7 +1974,7 @@ replace_args (cpp_reader *pfile, cpp_has > MACRO_ARG_TOKEN_NORMAL, > arg, arg->first); > } > - else if (src != macro->exp.tokens && (src[-1].flags & PASTE_LEFT)) > + else if (prev_flags & PASTE_LEFT) > { > int num_toks; > arg_tokens_count = arg->count; > @@ -1936,7 +2026,7 @@ replace_args (cpp_reader *pfile, cpp_has > > /* Padding on the left of an argument (unless RHS of ##). */ > if ((!pfile->state.in_directive || > pfile->state.directive_wants_padding) > - && src != macro->exp.tokens && !(src[-1].flags & PASTE_LEFT)) > + && src != macro->exp.tokens && !(prev_flags & PASTE_LEFT)) > { > const cpp_token *t = padding_token (pfile, src); > unsigned index = expanded_token_index (pfile, macro, src, i); > @@ -1960,9 +2050,9 @@ replace_args (cpp_reader *pfile, cpp_has > save extra memory while tracking macro expansion > locations. So in that case here is what we do: > > - Suppose we have #define SQARE(A) A * A > + Suppose we have #define SQUARE(A) A * A > > - And then we do SQARE(2+3) > + And then we do SQUARE(2+3) > > Then the tokens 2, +, 3, will have the same location, > saying they come from the expansion of the argument > @@ -1970,9 +2060,9 @@ replace_args (cpp_reader *pfile, cpp_has > > So that means we are going to ignore the COUNT tokens > resulting from the expansion of the current macro > - arugment. In other words all the ARG_TOKENS_COUNT tokens > + argument. In other words all the ARG_TOKENS_COUNT tokens > resulting from the expansion of the macro argument will > - have the index I. Normally, each of those token should > + have the index I. Normally, each of those token should > have index I+J. */ > unsigned token_index = i; > unsigned index; > @@ -1989,9 +2079,9 @@ replace_args (cpp_reader *pfile, cpp_has > > /* With a non-empty argument on the LHS of ##, the last > token should be flagged PASTE_LEFT. */ > - if (src->flags & PASTE_LEFT) > - paste_flag = > - (const cpp_token **) tokens_buff_last_token_ptr (buff); > + if (flags & PASTE_LEFT) > + paste_flag > + = (const cpp_token **) tokens_buff_last_token_ptr (buff); > } > else if (CPP_PEDANTIC (pfile) && ! CPP_OPTION (pfile, c99) > && ! macro->syshdr && ! cpp_in_system_header (pfile)) > @@ -2021,7 +2111,7 @@ replace_args (cpp_reader *pfile, cpp_has > NODE_NAME (node), src->val.macro_arg.arg_no); > > /* Avoid paste on RHS (even case count == 0). */ > - if (!pfile->state.in_directive && !(src->flags & PASTE_LEFT)) > + if (!pfile->state.in_directive && !(flags & PASTE_LEFT)) > { > const cpp_token *t = &pfile->avoid_paste; > tokens_buff_add_token (buff, virt_locs, > @@ -2035,7 +2125,7 @@ replace_args (cpp_reader *pfile, cpp_has > cpp_token *token = _cpp_temp_token (pfile); > token->type = (*paste_flag)->type; > token->val = (*paste_flag)->val; > - if (src->flags & PASTE_LEFT) > + if (flags & PASTE_LEFT) > token->flags = (*paste_flag)->flags | PASTE_LEFT; > else > token->flags = (*paste_flag)->flags & ~PASTE_LEFT; > @@ -2043,6 +2133,7 @@ replace_args (cpp_reader *pfile, cpp_has > } > > i += arg_tokens_count; > + prev_flags = flags; > } > > if (track_macro_exp) > @@ -3304,7 +3395,7 @@ create_iso_definition (cpp_reader *pfile > } > } > > - if (vaopt_tracker.update (token) == vaopt_state::ERROR) > + if (vaopt_tracker.update (token, NULL) == vaopt_state::ERROR) > return false; > > following_paste_op = (token->type == CPP_PASTE); > --- gcc/testsuite/c-c++-common/cpp/va-opt-2.c.jj 2018-01-09 > 13:06:56.721345854 +0100 > +++ gcc/testsuite/c-c++-common/cpp/va-opt-2.c 2018-01-09 17:13:35.759095064 > +0100 > @@ -0,0 +1,41 @@ > +/* PR preprocessor/83063 */ > +/* { dg-do compile } */ > +/* { dg-options "-std=gnu99" { target c } } */ > +/* { dg-options "-std=c++2a" { target c++ } } */ > + > +#define f1(...) int b##__VA_OPT__(c) > +#define f2(...) int __VA_OPT__(c)##d > +#define f3(...) int e##__VA_OPT__() > +#define f4(...) int __VA_OPT__()##f > +#define f5(...) int g##__VA_OPT__(h)##i > +#define f6(...) int j##__VA_OPT__()##k > +#define f7(...) int l##__VA_OPT__() > +#define f8(...) int __VA_OPT__()##m > +#define f9(...) int n##__VA_OPT__()##o > +#define f10(x, ...) int x##__VA_OPT__(x) > +#define f11(x, ...) int __VA_OPT__(x)##x > +#define f12(x, ...) int x##__VA_OPT__(x)##x > +f1 (1, 2, 3); > +f1 (); > +f2 (1, 2); > +f2 (); > +f3 (1); > +f4 (2); > +f5 (6, 7); > +f5 (); > +f6 (8); > +f7 (); > +f8 (); > +f9 (); > +f10 (p, 5, 6); > +f10 (p); > +f11 (q, 7); > +f11 (q); > +f12 (r, 1, 2, 3, 4, 5); > +f12 (r); > + > +int > +main () > +{ > + return bc + b + cd + d + e + f + ghi + gi + jk + l + m + no + pp + p + qq > + q + rrr + rr; > +} > --- gcc/testsuite/c-c++-common/cpp/va-opt-3.c.jj 2018-01-09 > 17:22:32.609158440 +0100 > +++ gcc/testsuite/c-c++-common/cpp/va-opt-3.c 2018-01-09 17:20:14.985142201 > +0100 > @@ -0,0 +1,69 @@ > +/* PR preprocessor/83063 */ > +/* PR preprocessor/83708 */ > +/* { dg-do preprocess } */ > +/* { dg-options "-std=gnu99" { target c } } */ > +/* { dg-options "-std=c++2a" { target c++ } } */ > + > +#define f1(...) b##__VA_OPT__(c) > +#define f2(...) __VA_OPT__(c)##d > +#define f3(...) e##__VA_OPT__() > +#define f4(...) __VA_OPT__()##f > +#define f5(...) g##__VA_OPT__(h)##i > +#define f6(...) j##__VA_OPT__()##k > +#define f7(...) l##__VA_OPT__() > +#define f8(...) __VA_OPT__()##m > +#define f9(...) n##__VA_OPT__()##o > +#define f10(x, ...) x##__VA_OPT__(x) > +#define f11(x, ...) __VA_OPT__(x)##x > +#define f12(x, ...) x##__VA_OPT__(x)##x > +#define f13(...) __VA_OPT__(a)__VA_OPT__(b)c > +#define f14(a, b, c, ...) __VA_OPT__(a)__VA_OPT__(b)c > +#define f15(a, b, c, ...) __VA_OPT__(a b)__VA_OPT__(b c)a/**/__VA_OPT__(c a)a > +t1 f1 (1, 2, 3); > +/* { dg-final { scan-file va-opt-3.i "t1 bc;" } } */ > +t2 f1 (); > +/* { dg-final { scan-file va-opt-3.i "t2 b;" } } */ > +t3 f2 (1, 2); > +/* { dg-final { scan-file va-opt-3.i "t3 cd;" } } */ > +t4 f2 (); > +/* { dg-final { scan-file va-opt-3.i "t4 d;" } } */ > +t5 f3 (1); > +/* { dg-final { scan-file va-opt-3.i "t5 e;" } } */ > +t6 f4 (2); > +/* { dg-final { scan-file va-opt-3.i "t6 f;" } } */ > +t7 f5 (6, 7); > +/* { dg-final { scan-file va-opt-3.i "t7 ghi;" } } */ > +t8 f5 (); > +/* { dg-final { scan-file va-opt-3.i "t8 gi;" } } */ > +t9 f6 (8); > +/* { dg-final { scan-file va-opt-3.i "t9 jk;" } } */ > +t10 f7 (); > +/* { dg-final { scan-file va-opt-3.i "t10 l;" } } */ > +t11 f8 (); > +/* { dg-final { scan-file va-opt-3.i "t11 m;" } } */ > +t12 f9 (); > +/* { dg-final { scan-file va-opt-3.i "t12 no;" } } */ > +t13 f10 (p, 5, 6); > +/* { dg-final { scan-file va-opt-3.i "t13 pp;" } } */ > +t14 f10 (p); > +/* { dg-final { scan-file va-opt-3.i "t14 p;" } } */ > +t15 f11 (q, 7); > +/* { dg-final { scan-file va-opt-3.i "t15 qq;" } } */ > +t16 f11 (q); > +/* { dg-final { scan-file va-opt-3.i "t16 q;" } } */ > +t17 f12 (r, 1, 2, 3, 4, 5); > +/* { dg-final { scan-file va-opt-3.i "t17 rrr;" } } */ > +t18 f12 (r); > +/* { dg-final { scan-file va-opt-3.i "t18 rr;" } } */ > +t19 f13 (1, 2); > +/* { dg-final { scan-file va-opt-3.i "t19 a b c;" } } */ > +t20 f13 (); > +/* { dg-final { scan-file va-opt-3.i "t20 c;" } } */ > +t21 f14 (3, 4, 5, 2); > +/* { dg-final { scan-file va-opt-3.i "t21 3 4 5;" } } */ > +t22 f14 (3, 4, 5); > +/* { dg-final { scan-file va-opt-3.i "t22 5;" } } */ > +t23 f15 (6, 7, 8, 9); > +/* { dg-final { scan-file va-opt-3.i "t23 6 7 7 8 6 8 6 6;" } } */ > +t24 f15 (6, 7, 8); > +/* { dg-final { scan-file va-opt-3.i "t24 6 6;" } } */ Jakub