Re: [cfe-commits] Patch: Add fix-it for digraph errors involving " <::"

Richard Trieu Tue, 05 Apr 2011 15:18:38 -0700

Taking into account Richard Smith's comments, I have made changes to the
patch:
1) Check for whitespace and newlines between '<:' and ':'
2) Add test case for #1
3) Update SourceLocation for '::'


On Mon, Apr 4, 2011 at 5:19 PM, Richard Trieu <[email protected]> wrote:

> Sorry for the mess up, but I accidentally did a reply instead of reply-all.
>  This includes my response to Richard Smith, his reply to me, and my reply
> to his reply.
>
> On Mon, Apr 4, 2011 at 4:39 PM, Richard Smith <[email protected]>wrote:
>
>> On Mon, April 4, 2011 22:57, Richard Trieu wrote:
>> > On Mon, Apr 4, 2011 at 1:19 PM, Richard Smith <[email protected]>
>> > wrote:
>> >> On Wed, 30 Mar 2011, Richard Trieu <[email protected]> wrote:
>> >>> Detect when the string "<::" is found in code after a cast or
>> >>> template name and is interpreted as "[:" because of the digraph "<:".
>> >>> When found,
>> >>> give an error with a fix-it to add whitespace between the "<" and
>> >>> "::".
>> >>> Also, repair the token stream and recover parsing afterwards.
>> >>
>> >> I don't think that this is the right fix. In C++0x, '<::' (not followed
>> >> by ':' or '>') is lexed as '<' followed by '::' (see [lex.pptoken]p3
>> >> bullet 2, or
>> >> http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1104).
>> >> Implementing that rule (for both C++0x and earlier) seems to me like a
>> >> better fit, even though it breaks legal-but-pathological C++98 programs
>> >> like this:
>> >>
>> >> int a[42]; #define A(n) a<::##:n:>
>> >> int b = A(b);
>> >
>> > I don't think it's a good idea to break valid code.  Also, your link
>> > shows that fix in the lexer is not yet in the standard.
>>
>> Have a look at [lex.pptoken]p3 in N3242, the latest public draft of the
>> C++0x standard. It's there already :)
>>
> Ah, okay.  I see it now.  I thought that you had linked to a discussion of
> possible features.
>
>>
>> > Even if it were, it
>> > should only be applied to C++0x with the parser handling this error in
>> > other versions.
>>
>> I don't have a particularly strong opinion on this. Personally, I'm much
>> more interested in recovering nicely for code which accidentally contains
>> '<::' and means '< ::' than I am in preserving the semantics of
>> theoretical code where '<::' is supposed to be a digraph, but I'm not sure
>> what's the best fit for clang's goals (my preference would be that this is
>> an ExtWarn with a fixit outside of C++0x).
>>
>> If you still want to implement it this way, I have some comments on the
>> patch itself:
>>
>> It would be nice to also cover the comparison case: '::a <::b'.
>
> Maybe later.  This patch was limited to prevent unforeseen impacts to other
> code.
>
>>
>> Index: lib/Parse/ParseExprCXX.cpp
>> ===================================================================
>> --- lib/Parse/ParseExprCXX.cpp  (revision 128572)
>> +++ lib/Parse/ParseExprCXX.cpp  (working copy)
>> @@ -20,6 +20,45 @@
>>
>>  using namespace clang;
>>
>> +static int SelectDigraphErrorMessage(tok::TokenKind Kind) {
>> +  switch (Kind) {
>> +    case tok::kw_template:         return 0;
>> +    case tok::kw_const_cast:       return 1;
>> +    case tok::kw_dynamic_cast:     return 2;
>> +    case tok::kw_reinterpret_cast: return 3;
>> +    case tok::kw_static_cast:      return 4;
>> +    default:
>> +      assert(0 && "Unknown type for digraph error message.");
>> +      return -1;
>> +  }
>> +}
>> +
>> +// Suggest fixit for "<::" after a cast.
>> +static void FixDigraph(Parser &P, Preprocessor &PP, Token &DigraphToken,
>> +                       Token &ColonToken, tok::TokenKind Kind, bool
>> AtDigraph) {
>> +  // Pull '<:' and ':' off token stream.
>> +  if (!AtDigraph)
>> +    PP.Lex(DigraphToken);
>> +  PP.Lex(ColonToken);
>> +
>> +  SourceRange Range;
>> +  Range.setBegin(DigraphToken.getLocation());
>> +  Range.setEnd(ColonToken.getLocation());
>> +  P.Diag(DigraphToken.getLocation(),
>> diag::err_missing_whitespace_digraph)
>> +      << SelectDigraphErrorMessage(Kind)
>> +      << FixItHint::CreateReplacement(Range, "< ::");
>> +
>> +  // Reuse tokens to preserve source location and macro instantiation
>> +  // information.
>> +  ColonToken.setKind(tok::coloncolon);
>>
>> The source location for this token will be off by one character. Later
>> diagnostics could be confusing.
>
> The source location would be off by 1, pointing to the second colon instead
> of the first.  Would that cause confusion?
>
>>
>> +  DigraphToken.setKind(tok::less);
>> +
>> +  // Push new tokens back to token stream.
>> +  PP.EnterToken(ColonToken);
>> +  if (!AtDigraph)
>> +    PP.EnterToken(DigraphToken);
>> +}
>> +
>>  /// \brief Parse global scope or nested-name-specifier if present.
>>  ///
>>  /// Parses a C++ global scope specifier ('::') or nested-name-specifier
>> (which
>> @@ -287,6 +326,28 @@
>>       continue;
>>     }
>>
>> +    // Check for '<::' which should be '< ::' instead of '[:' when
>> following
>> +    // a template name.
>> +    if (Next.is(tok::l_square) && Next.getLength() == 2) {
>> +      Token SecondToken = GetLookAheadToken(2);
>> +      if (SecondToken.is(tok::colon)) {
>>
>> This recovery codepath should only be entered if the '[' and ':' are
>> adjacent in the source file.
>>
> I'll have a look into it. I'm guessing whitespace would throw off the
> parser?
>
>>
>> +        TemplateTy Template;
>> +        UnqualifiedId TemplateName;
>> +        TemplateName.setIdentifier(&II, Tok.getLocation());
>> +        bool MemberOfUnknownSpecialization;
>> +        if (Actions.isTemplateName(getCurScope(), SS,
>> +                                   /*hasTemplateKeyword=*/false,
>> +                                   TemplateName,
>> +                                   ObjectType,
>> +                                   EnteringContext,
>> +                                   Template,
>> +                                   MemberOfUnknownSpecialization)) {
>> +          FixDigraph(*this, PP, Next, SecondToken, tok::kw_template,
>> +                     /*AtDigraph*/false);
>> +        }
>> +      }
>> +    }
>> +
>>     // nested-name-specifier:
>>     //   type-name '<'
>>     if (Next.is(tok::less)) {
>> @@ -453,6 +514,13 @@
>>   SourceLocation OpLoc = ConsumeToken();
>>   SourceLocation LAngleBracketLoc = Tok.getLocation();
>>
>> +  // Check for "<::" which is parsed as "[:".  If found, fix token
>> stream,
>> +  // diagnose error, suggest fix, and recover parsing.
>> +  Token Next = NextToken();
>> +  if (Tok.is(tok::l_square) && Tok.getLength() == 2 &&
>> Next.is(tok::colon)) {
>>
>> Likewise here.
>>
>> +    FixDigraph(*this, PP, Tok, Next, Kind, /*AtDigraph*/true);
>> +  }
>> +
>>   if (ExpectAndConsume(tok::less, diag::err_expected_less_after,
>> CastName))
>>     return ExprError();
>>
>>
>> Incidentally, did you intentionally remove cfe-commits from CC?
>>
>
> No, I just clicked Reply instead of Reply-All.  They're like right next to
> each other.
>
>>
>> Richard
>>
>> Other Richard

Index: test/Parser/cxx-casting.cpp
===================================================================
--- test/Parser/cxx-casting.cpp	(revision 128572)
+++ test/Parser/cxx-casting.cpp	(working copy)
@@ -1,4 +1,4 @@
-// RUN: %clang_cc1 -fsyntax-only %s
+// RUN: %clang_cc1 -fsyntax-only -verify %s
 
 char *const_cast_test(const char *var)
 {
@@ -37,3 +37,23 @@
   template <class T> class A {};
   void foo() { A<int>(*(A<int>*)0); }
 }
+
+typedef char* c;
+typedef A* a;
+void test2(char x, struct B * b) {
+  (void)const_cast<::c>(&x);  // expected-error{{found '<::' after a const_cast which forms the digraph '<:' (aka '[') and a ':', did you mean '< ::'?}}
+  (void)dynamic_cast<::a>(b);  // expected-error{{found '<::' after a dynamic_cast which forms the digraph '<:' (aka '[') and a ':', did you mean '< ::'?}}
+  (void)reinterpret_cast<::c>(x);  // expected-error{{found '<::' after a reinterpret_cast which forms the digraph '<:' (aka '[') and a ':', did you mean '< ::'?}}
+  (void)static_cast<::c>(&x);  // expected-error{{found '<::' after a static_cast which forms the digraph '<:' (aka '[') and a ':', did you mean '< ::'?}}
+
+  // Do not do digraph correction.
+  (void)static_cast<: :c>(&x); //\
+       expected-error {{expected '<' after 'static_cast'}} \
+       expected-error {{expected expression}}\
+       expected-error {{expected ']'}}\
+       expected-note {{to match this '['}}
+  (void)static_cast<: // expected-error {{expected '<' after 'static_cast'}} \
+                         expected-note {{to match this '['}}
+  :c>(&x); // expected-error {{expected expression}} \
+              expected-error {{expected ']'}}
+}
Index: test/SemaTemplate/temp_arg_template.cpp
===================================================================
--- test/SemaTemplate/temp_arg_template.cpp	(revision 128572)
+++ test/SemaTemplate/temp_arg_template.cpp	(working copy)
@@ -30,10 +30,13 @@
 
 A<f> *a9; // expected-error{{must be a class template}}
 
-// FIXME: The code below is ill-formed, because of the evil digraph '<:'. 
-// We should provide a much better error message than we currently do.
-// A<::N::Z> *a10;
+// Evil digraph '<:' is parsed as '[', expect error.
+A<::N::Z> *a10; // expected-error{{found '<::' after a template name which forms the digraph '<:' (aka '[') and a ':', did you mean '< ::'?}}
 
+// Do not do a digraph correction here.
+A<: :N::Z> *a11;  // expected-error{{expected expression}} \
+          expected-error{{C++ requires a type specifier for all declarations}}
+
 // PR7807
 namespace N {
   template <typename, typename = int> 
Index: include/clang/Basic/DiagnosticParseKinds.td
===================================================================
--- include/clang/Basic/DiagnosticParseKinds.td	(revision 128572)
+++ include/clang/Basic/DiagnosticParseKinds.td	(working copy)
@@ -410,6 +410,10 @@
 // C++ declarations
 def err_friend_decl_defines_class : Error<
   "cannot define a type in a friend declaration">;
+def err_missing_whitespace_digraph : Error<
+  "found '<::' after a "
+  "%select{template name|const_cast|dynamic_cast|reinterpret_cast|static_cast}0"
+  " which forms the digraph '<:' (aka '[') and a ':', did you mean '< ::'?">;
 
 def warn_deleted_function_accepted_as_extension: ExtWarn<
   "deleted function definition accepted as a C++0x extension">, InGroup<CXX0x>;
Index: lib/Parse/ParseExprCXX.cpp
===================================================================
--- lib/Parse/ParseExprCXX.cpp	(revision 128572)
+++ lib/Parse/ParseExprCXX.cpp	(working copy)
@@ -20,6 +20,47 @@
 
 using namespace clang;
 
+static int SelectDigraphErrorMessage(tok::TokenKind Kind) {
+  switch (Kind) {
+    case tok::kw_template:         return 0;
+    case tok::kw_const_cast:       return 1;
+    case tok::kw_dynamic_cast:     return 2;
+    case tok::kw_reinterpret_cast: return 3;
+    case tok::kw_static_cast:      return 4;
+    default:
+      assert(0 && "Unknown type for digraph error message.");
+      return -1;
+  }
+}
+
+// Suggest fixit for "<::" after a cast.
+static void FixDigraph(Parser &P, Preprocessor &PP, Token &DigraphToken,
+                       Token &ColonToken, tok::TokenKind Kind, bool AtDigraph) {
+  // Pull '<:' and ':' off token stream.
+  if (!AtDigraph)
+    PP.Lex(DigraphToken);
+  PP.Lex(ColonToken);
+
+  SourceRange Range;
+  Range.setBegin(DigraphToken.getLocation());
+  Range.setEnd(ColonToken.getLocation());
+  P.Diag(DigraphToken.getLocation(), diag::err_missing_whitespace_digraph)
+      << SelectDigraphErrorMessage(Kind)
+      << FixItHint::CreateReplacement(Range, "< ::");
+
+  // Update token information to reflect their change in token type.
+  ColonToken.setKind(tok::coloncolon);
+  ColonToken.setLocation(ColonToken.getLocation().getFileLocWithOffset(-1));
+  ColonToken.setLength(2);
+  DigraphToken.setKind(tok::less);
+  DigraphToken.setLength(1);
+
+  // Push new tokens back to token stream.
+  PP.EnterToken(ColonToken);
+  if (!AtDigraph)
+    PP.EnterToken(DigraphToken);
+}
+
 /// \brief Parse global scope or nested-name-specifier if present.
 ///
 /// Parses a C++ global scope specifier ('::') or nested-name-specifier (which
@@ -287,6 +328,29 @@
       continue;
     }
 
+    // Check for '<::' which should be '< ::' instead of '[:' when following
+    // a template name.
+    if (Next.is(tok::l_square) && Next.getLength() == 2) {
+      Token SecondToken = GetLookAheadToken(2);
+      if (SecondToken.is(tok::colon) && !SecondToken.hasLeadingSpace() &&
+          !SecondToken.isAtStartOfLine()) {
+        TemplateTy Template;
+        UnqualifiedId TemplateName;
+        TemplateName.setIdentifier(&II, Tok.getLocation());
+        bool MemberOfUnknownSpecialization;
+        if (Actions.isTemplateName(getCurScope(), SS,
+                                   /*hasTemplateKeyword=*/false,
+                                   TemplateName,
+                                   ObjectType,
+                                   EnteringContext,
+                                   Template,
+                                   MemberOfUnknownSpecialization)) {
+          FixDigraph(*this, PP, Next, SecondToken, tok::kw_template,
+                     /*AtDigraph*/false);
+        }
+      }
+    }
+
     // nested-name-specifier:
     //   type-name '<'
     if (Next.is(tok::less)) {
@@ -453,6 +517,14 @@
   SourceLocation OpLoc = ConsumeToken();
   SourceLocation LAngleBracketLoc = Tok.getLocation();
 
+  // Check for "<::" which is parsed as "[:".  If found, fix token stream,
+  // diagnose error, suggest fix, and recover parsing.
+  Token Next = NextToken();
+  if (Tok.is(tok::l_square) && Tok.getLength() == 2 && Next.is(tok::colon) &&
+      !Next.isAtStartOfLine() && !Next.hasLeadingSpace()) {
+    FixDigraph(*this, PP, Tok, Next, Kind, /*AtDigraph*/true);
+  }
+
   if (ExpectAndConsume(tok::less, diag::err_expected_less_after, CastName))
     return ExprError();

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] Patch: Add fix-it for digraph errors involving " <::"

Reply via email to