On Wed, 2015-12-09 at 18:44 +0100, Bernd Schmidt wrote: > On 12/09/2015 05:58 PM, David Malcolm wrote: > > On Wed, 2015-11-04 at 14:56 +0100, Bernd Schmidt wrote: > >> > >> This seems like fairly low impact but also low cost, so I'm fine with it > >> in principle. I wonder whether the length of the marker is the same > >> across all versions of patch (and VC tools)? > > > > It's hardcoded for GNU patch: > [...] > >From what I can tell, Perforce is the outlier here. > > Thanks for checking all that. > > >> Just thinking out loud - I guess it would be too much to hope for to > >> share lexers between frontends so that we need only one copy of this? > > > > Probably :( > > Someone slap sense into me, I just thought of deriving C and C++ parsers > from a common base class... (no this is not a suggestion for this patch). > > > Would a better wording be: > > > > extern short some_var; /* This line would lead to a warning due to the > > duplicate name, but it is skipped when handling > > the conflict marker. */ > > I think so, yes. > > > That said, it's not clear they're always at the beginning of a line; > > this bazaar bug indicates that CVS (and bazaar) can emit them > > mid-line: > > https://bugs.launchpad.net/bzr/+bug/36399 > > Ok. CVS I think we shouldn't worry about, and it looks like this is one > particular bug/corner case where the conflict end marker is the last > thing in the file. I think on the whole it's best to check for beginning > of the line as you've done. > > > Wording-wise, should it be "merge conflict marker", rather > > than "patch conflict marker"? > > > > Clang spells it: > > "error: version control conflict marker in file" > > http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#merge_conflicts > > Yeah, if another compiler has a similar/identical diagnostic I think we > should just copy that unless there's a very good reason not to. > > > Rebased on top of r231445 (from yesterday). > > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. > > Adds 82 new PASSes to g++.sum and 27 new PASSes to gcc.sum. > > > > OK for trunk? > > I'm inclined to say yes since it was originally submitted in time and > it's hard to imagine how the change could be risky (I'll point out right > away that there are one or two other patches in the queue that were also > submitted in time which I feel should not be considered for gcc-6 at > this point due to risk). > > Let's wait until the end of the week for objections, commit then.
Thanks. I updated it based on the feedback above, including changing the wording to match clang's: "error: version control conflict marker in file" I replaced "patch conflict marker" with "conflict marker" in the code and names of test cases. I took the liberty of adding: gcc_assert (n > 0); to c_parser_peek_nth_token based on Martin's feedback. Having verified bootstrap®rtest (on x86_64-pc-linux-gnu), I've committed it to trunk as r231712. I'm attaching what I committed, for reference.
Index: gcc/c-family/ChangeLog =================================================================== --- gcc/c-family/ChangeLog (revision 231711) +++ gcc/c-family/ChangeLog (revision 231712) @@ -1,3 +1,8 @@ +2015-12-16 David Malcolm <dmalc...@redhat.com> + + * c-common.h (conflict_marker_get_final_tok_kind): New prototype. + * c-lex.c (conflict_marker_get_final_tok_kind): New function. + 2015-12-15 Ilya Verbin <ilya.ver...@intel.com> * c-common.c (c_common_attribute_table): Handle "omp declare target Index: gcc/c-family/c-lex.c =================================================================== --- gcc/c-family/c-lex.c (revision 231711) +++ gcc/c-family/c-lex.c (revision 231712) @@ -1263,3 +1263,29 @@ return value; } + +/* Helper function for c_parser_peek_conflict_marker + and cp_lexer_peek_conflict_marker. + Given a possible conflict marker token of kind TOK1_KIND + consisting of a pair of characters, get the token kind for the + standalone final character. */ + +enum cpp_ttype +conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind) +{ + switch (tok1_kind) + { + default: gcc_unreachable (); + case CPP_LSHIFT: + /* "<<" and '<' */ + return CPP_LESS; + + case CPP_EQ_EQ: + /* "==" and '=' */ + return CPP_EQ; + + case CPP_RSHIFT: + /* ">>" and '>' */ + return CPP_GREATER; + } +} Index: gcc/c-family/c-common.h =================================================================== --- gcc/c-family/c-common.h (revision 231711) +++ gcc/c-family/c-common.h (revision 231712) @@ -1089,6 +1089,10 @@ extern int c_gimplify_expr (tree *, gimple_seq *, gimple_seq *); extern tree c_build_bind_expr (location_t, tree, tree); +/* In c-lex.c. */ +extern enum cpp_ttype +conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind); + /* In c-pch.c */ extern void pch_init (void); extern void pch_cpp_save_state (void); Index: gcc/c/ChangeLog =================================================================== --- gcc/c/ChangeLog (revision 231711) +++ gcc/c/ChangeLog (revision 231712) @@ -1,5 +1,13 @@ 2015-12-16 David Malcolm <dmalc...@redhat.com> + * c-parser.c (struct c_parser): Expand array "tokens_buf" from 2 + to 4. + (c_parser_peek_nth_token): New function. + (c_parser_peek_conflict_marker): New function. + (c_parser_error): Detect conflict markers and report them as such. + +2015-12-16 David Malcolm <dmalc...@redhat.com> + * c-parser.c (c_parser_postfix_expression): Use EXPR_LOC_OR_LOC to preserve range information for the primary expression in the call to c_parser_postfix_expression_after_primary. Index: gcc/c/c-parser.c =================================================================== --- gcc/c/c-parser.c (revision 231711) +++ gcc/c/c-parser.c (revision 231712) @@ -202,8 +202,8 @@ /* The look-ahead tokens. */ c_token * GTY((skip)) tokens; /* Buffer for look-ahead tokens. */ - c_token tokens_buf[2]; - /* How many look-ahead tokens are available (0, 1 or 2, or + c_token tokens_buf[4]; + /* How many look-ahead tokens are available (0 - 4, or more if parsing from pre-lexed tokens). */ unsigned int tokens_avail; /* True if a syntax error is being recovered from; false otherwise. @@ -492,6 +492,23 @@ return &parser->tokens[1]; } +/* Return a pointer to the Nth token from PARSER, reading it + in if necessary. The N-1th token is already read in. */ + +static c_token * +c_parser_peek_nth_token (c_parser *parser, unsigned int n) +{ + /* N is 1-based, not zero-based. */ + gcc_assert (n > 0); + + if (parser->tokens_avail >= n) + return &parser->tokens[n - 1]; + gcc_assert (parser->tokens_avail == n - 1); + c_lex_one_token (parser, &parser->tokens[n - 1]); + parser->tokens_avail = n; + return &parser->tokens[n - 1]; +} + /* Return true if TOKEN can start a type name, false otherwise. */ static bool @@ -829,6 +846,46 @@ } } +/* Helper function for c_parser_error. + Having peeked a token of kind TOK1_KIND that might signify + a conflict marker, peek successor tokens to determine + if we actually do have a conflict marker. + Specifically, we consider a run of 7 '<', '=' or '>' characters + at the start of a line as a conflict marker. + These come through the lexer as three pairs and a single, + e.g. three CPP_LSHIFT ("<<") and a CPP_LESS ('<'). + If it returns true, *OUT_LOC is written to with the location/range + of the marker. */ + +static bool +c_parser_peek_conflict_marker (c_parser *parser, enum cpp_ttype tok1_kind, + location_t *out_loc) +{ + c_token *token2 = c_parser_peek_2nd_token (parser); + if (token2->type != tok1_kind) + return false; + c_token *token3 = c_parser_peek_nth_token (parser, 3); + if (token3->type != tok1_kind) + return false; + c_token *token4 = c_parser_peek_nth_token (parser, 4); + if (token4->type != conflict_marker_get_final_tok_kind (tok1_kind)) + return false; + + /* It must be at the start of the line. */ + location_t start_loc = c_parser_peek_token (parser)->location; + if (LOCATION_COLUMN (start_loc) != 1) + return false; + + /* We have a conflict marker. Construct a location of the form: + <<<<<<< + ^~~~~~~ + with start == caret, finishing at the end of the marker. */ + location_t finish_loc = get_finish (token4->location); + *out_loc = make_location (start_loc, start_loc, finish_loc); + + return true; +} + /* Issue a diagnostic of the form FILE:LINE: MESSAGE before TOKEN where TOKEN is the next token in the input stream of PARSER. @@ -850,6 +907,20 @@ parser->error = true; if (!gmsgid) return; + + /* If this is actually a conflict marker, report it as such. */ + if (token->type == CPP_LSHIFT + || token->type == CPP_RSHIFT + || token->type == CPP_EQ_EQ) + { + location_t loc; + if (c_parser_peek_conflict_marker (parser, token->type, &loc)) + { + error_at (loc, "version control conflict marker in file"); + return; + } + } + /* This diagnostic makes more sense if it is tagged to the line of the token we just peeked at. */ c_parser_set_source_position_from_token (token); Index: gcc/testsuite/ChangeLog =================================================================== --- gcc/testsuite/ChangeLog (revision 231711) +++ gcc/testsuite/ChangeLog (revision 231712) @@ -1,5 +1,20 @@ 2015-12-16 David Malcolm <dmalc...@redhat.com> + * c-c++-common/conflict-markers-1.c: New testcase. + * c-c++-common/conflict-markers-2.c: Likewise. + * c-c++-common/conflict-markers-3.c: Likewise. + * c-c++-common/conflict-markers-4.c: Likewise. + * c-c++-common/conflict-markers-5.c: Likewise. + * c-c++-common/conflict-markers-6.c: Likewise. + * c-c++-common/conflict-markers-7.c: Likewise. + * c-c++-common/conflict-markers-8.c: Likewise. + * c-c++-common/conflict-markers-9.c: Likewise. + * c-c++-common/conflict-markers-10.c: Likewise. + * c-c++-common/conflict-markers-11.c: Likewise. + * g++.dg/conflict-markers-1.C: Likewise. + +2015-12-16 David Malcolm <dmalc...@redhat.com> + * gcc.dg/cast-function-1.c (bar): Update column numbers. * gcc.dg/diagnostic-range-bad-called-object.c: New test case. Index: gcc/testsuite/g++.dg/conflict-markers-1.C =================================================================== --- gcc/testsuite/g++.dg/conflict-markers-1.C (revision 0) +++ gcc/testsuite/g++.dg/conflict-markers-1.C (revision 231712) @@ -0,0 +1,13 @@ +/* Ensure that we don't complain about conflict markers on + valid template argument lists, valid in C++11 onwards. */ +// { dg-options "-std=c++11" } + +template <typename T> +struct foo +{ + T t; +}; + +foo <foo <foo <foo <foo <foo <foo <int +>>>>>>> f; +// The above line is valid C++11, and isn't a conflict marker Index: gcc/testsuite/c-c++-common/conflict-markers-11.c =================================================================== --- gcc/testsuite/c-c++-common/conflict-markers-11.c (revision 0) +++ gcc/testsuite/c-c++-common/conflict-markers-11.c (revision 231712) @@ -0,0 +1,14 @@ +/* Verify that we only report conflict markers at the start of lines. */ +int p; + + <<<<<<< HEAD /* { dg-error "expected identifier|expected unqualified-id" } */ + +int q; + + ======= /* { dg-error "expected identifier|expected unqualified-id" } */ + +int r; + + >>>>>>> Some commit message /* { dg-error "expected identifier|expected unqualified-id" } */ + +int s; Index: gcc/testsuite/c-c++-common/conflict-markers-4.c =================================================================== --- gcc/testsuite/c-c++-common/conflict-markers-4.c (revision 0) +++ gcc/testsuite/c-c++-common/conflict-markers-4.c (revision 231712) @@ -0,0 +1,11 @@ +/* Ensure we can handle mismatched conflict markers. */ + +int p; + +>>>>>>> Some commit message /* { dg-error "conflict marker" } */ + +int q; + +>>>>>>> Some other commit message /* { dg-error "conflict marker" } */ + +int r; Index: gcc/testsuite/c-c++-common/conflict-markers-5.c =================================================================== --- gcc/testsuite/c-c++-common/conflict-markers-5.c (revision 0) +++ gcc/testsuite/c-c++-common/conflict-markers-5.c (revision 231712) @@ -0,0 +1,11 @@ +/* Ensure we can handle mismatched conflict markers. */ + +int p; + +======= /* { dg-error "conflict marker" } */ + +int q; + +======= /* { dg-error "conflict marker" } */ + +int r; Index: gcc/testsuite/c-c++-common/conflict-markers-6.c =================================================================== --- gcc/testsuite/c-c++-common/conflict-markers-6.c (revision 0) +++ gcc/testsuite/c-c++-common/conflict-markers-6.c (revision 231712) @@ -0,0 +1,38 @@ +/* Branch coverage of conflict marker detection: + none of these should be reported as conflict markers. */ + +int a0; + +<< HEAD /* { dg-error "expected" } */ + +int a1; + +<<<< HEAD /* { dg-error "expected" } */ + +int a2; + +<<<<<< HEAD /* { dg-error "expected" } */ + +int b0; + +== HEAD /* { dg-error "expected" } */ + +int b1; + +==== HEAD /* { dg-error "expected" } */ + +int b2; + +====== HEAD /* { dg-error "expected" } */ + +int c0; + +>> HEAD /* { dg-error "expected" } */ + +int c1; + +>>>> HEAD /* { dg-error "expected" } */ + +int c2; + +>>>>>> HEAD /* { dg-error "expected" } */ Index: gcc/testsuite/c-c++-common/conflict-markers-7.c =================================================================== --- gcc/testsuite/c-c++-common/conflict-markers-7.c (revision 0) +++ gcc/testsuite/c-c++-common/conflict-markers-7.c (revision 231712) @@ -0,0 +1,6 @@ +/* It's valid to stringize the "<<<<<<<"; don't + report it as a conflict marker. */ +#define str(s) #s +const char *s = str( +<<<<<<< +); Index: gcc/testsuite/c-c++-common/conflict-markers-8.c =================================================================== --- gcc/testsuite/c-c++-common/conflict-markers-8.c (revision 0) +++ gcc/testsuite/c-c++-common/conflict-markers-8.c (revision 231712) @@ -0,0 +1,4 @@ +/* A macro that's never expanded shouldn't be reported as a + conflict marker. */ +#define foo \ +<<<<<<< Index: gcc/testsuite/c-c++-common/conflict-markers-1.c =================================================================== --- gcc/testsuite/c-c++-common/conflict-markers-1.c (revision 0) +++ gcc/testsuite/c-c++-common/conflict-markers-1.c (revision 231712) @@ -0,0 +1,11 @@ +int p; + +<<<<<<< HEAD /* { dg-error "conflict marker" } */ +extern int some_var; +======= /* { dg-error "conflict marker" } */ +extern short some_var; /* This line would lead to a warning due to the + duplicate name, but it is skipped when handling + the conflict marker. */ +>>>>>>> Some commit message /* { dg-error "conflict marker" } */ + +int q; Index: gcc/testsuite/c-c++-common/conflict-markers-9.c =================================================================== --- gcc/testsuite/c-c++-common/conflict-markers-9.c (revision 0) +++ gcc/testsuite/c-c++-common/conflict-markers-9.c (revision 231712) @@ -0,0 +1,8 @@ +/* It's valid to have +<<<<<<< + inside both + comments (as above), and within string literals. */ +const char *s = "\ +<<<<<<<"; + +/* The above shouldn't be reported as errors. */ Index: gcc/testsuite/c-c++-common/conflict-markers-2.c =================================================================== --- gcc/testsuite/c-c++-common/conflict-markers-2.c (revision 0) +++ gcc/testsuite/c-c++-common/conflict-markers-2.c (revision 231712) @@ -0,0 +1,2 @@ +/* This should not be flagged as a conflict marker. */ +const char *msg = "<<<<<<< "; Index: gcc/testsuite/c-c++-common/conflict-markers-10.c =================================================================== --- gcc/testsuite/c-c++-common/conflict-markers-10.c (revision 0) +++ gcc/testsuite/c-c++-common/conflict-markers-10.c (revision 231712) @@ -0,0 +1,25 @@ +/* { dg-options "-fdiagnostics-show-caret" } */ + +<<<<<<< HEAD /* { dg-error "conflict marker" } */ +/* { dg-begin-multiline-output "" } + <<<<<<< HEAD + ^~~~~~~ + { dg-end-multiline-output "" } */ + +extern int some_var; + +======= /* { dg-error "conflict marker" } */ +/* { dg-begin-multiline-output "" } + ======= + ^~~~~~~ + { dg-end-multiline-output "" } */ + +extern short some_var; /* This line would lead to a warning due to the + duplicate name, but it is skipped when handling + the conflict marker. */ + +>>>>>>> Some commit message /* { dg-error "conflict marker" } */ +/* { dg-begin-multiline-output "" } + >>>>>>> + ^~~~~~~ + { dg-end-multiline-output "" } */ Index: gcc/testsuite/c-c++-common/conflict-markers-3.c =================================================================== --- gcc/testsuite/c-c++-common/conflict-markers-3.c (revision 0) +++ gcc/testsuite/c-c++-common/conflict-markers-3.c (revision 231712) @@ -0,0 +1,11 @@ +/* Ensure we can handle unterminated conflict markers. */ + +int p; + +<<<<<<< HEAD /* { dg-error "conflict marker" } */ + +int q; + +<<<<<<< HEAD /* { dg-error "conflict marker" } */ + +int r; Index: gcc/cp/ChangeLog =================================================================== --- gcc/cp/ChangeLog (revision 231711) +++ gcc/cp/ChangeLog (revision 231712) @@ -1,3 +1,9 @@ +2015-12-16 David Malcolm <dmalc...@redhat.com> + + * parser.c (cp_lexer_peek_conflict_marker): New function. + (cp_parser_error): Detect conflict markers and report them as + such. + 2015-12-15 Martin Sebor <mse...@redhat.com> c++/42121 Index: gcc/cp/parser.c =================================================================== --- gcc/cp/parser.c (revision 231711) +++ gcc/cp/parser.c (revision 231712) @@ -2689,6 +2689,46 @@ return token->keyword == keyword; } +/* Helper function for cp_parser_error. + Having peeked a token of kind TOK1_KIND that might signify + a conflict marker, peek successor tokens to determine + if we actually do have a conflict marker. + Specifically, we consider a run of 7 '<', '=' or '>' characters + at the start of a line as a conflict marker. + These come through the lexer as three pairs and a single, + e.g. three CPP_LSHIFT tokens ("<<") and a CPP_LESS token ('<'). + If it returns true, *OUT_LOC is written to with the location/range + of the marker. */ + +static bool +cp_lexer_peek_conflict_marker (cp_lexer *lexer, enum cpp_ttype tok1_kind, + location_t *out_loc) +{ + cp_token *token2 = cp_lexer_peek_nth_token (lexer, 2); + if (token2->type != tok1_kind) + return false; + cp_token *token3 = cp_lexer_peek_nth_token (lexer, 3); + if (token3->type != tok1_kind) + return false; + cp_token *token4 = cp_lexer_peek_nth_token (lexer, 4); + if (token4->type != conflict_marker_get_final_tok_kind (tok1_kind)) + return false; + + /* It must be at the start of the line. */ + location_t start_loc = cp_lexer_peek_token (lexer)->location; + if (LOCATION_COLUMN (start_loc) != 1) + return false; + + /* We have a conflict marker. Construct a location of the form: + <<<<<<< + ^~~~~~~ + with start == caret, finishing at the end of the marker. */ + location_t finish_loc = get_finish (token4->location); + *out_loc = make_location (start_loc, start_loc, finish_loc); + + return true; +} + /* If not parsing tentatively, issue a diagnostic of the form FILE:LINE: MESSAGE before TOKEN where TOKEN is the next token in the input stream. MESSAGE @@ -2713,6 +2753,19 @@ return; } + /* If this is actually a conflict marker, report it as such. */ + if (token->type == CPP_LSHIFT + || token->type == CPP_RSHIFT + || token->type == CPP_EQ_EQ) + { + location_t loc; + if (cp_lexer_peek_conflict_marker (parser->lexer, token->type, &loc)) + { + error_at (loc, "version control conflict marker in file"); + return; + } + } + c_parse_error (gmsgid, /* Because c_parser_error does not understand CPP_KEYWORD, keywords are treated like