Re: [PATCH] Don't remove unknown tokens that are not whitespace (PR17215).

Alexander Kornienko Wed, 09 Oct 2013 09:51:47 -0700


================
Comment at: lib/Format/TokenAnnotator.cpp:573
@@ -571,1 +572,3 @@
   void determineTokenType(FormatToken &Current) {
+    if (Current.Previous && Current.Previous->Tok.is(tok::unknown))
+      Current.Type = TT_ImplicitStringLiteral;
----------------
Daniel Jasper wrote:
> Alexander Kornienko wrote:
> > Daniel Jasper wrote:
> > > Can this actually make a difference? If yes, how? It seems like we 
> > > convert tok::unknown to tok::string_literal or set Type = 
> > > TT_ImplicitStringLiteral earlier or merge this into another token's 
> > > whitespace..
> > I think, we want to preserve whitespace around non-whitespace unknown 
> > tokens. The most straightforward way to do this seems to be setting the 
> > type to TT_ImplicitStringLiteral for the token itself and the next one. The 
> > former is done in Format.cpp, and the latter - here.
> > 
> > And, btw, we don't convert tok::unknown to tok::string_literal in this 
> > specific case. I'm not sure if we should.
> Why would we want to preserve that whitespace? I'd just add the whitespace as 
> leading whitespace to the TT_ImplicitStringLiteral.
> Why would we want to preserve that whitespace?


Because it's hard to figure out the meaning of an unknown token. The example 
from http://llvm.org/PR17215 is:

  #define VER_QUALIFIER \x20(dbg)

Here we shouldn't insert a space between the backslash and "x20". And if a 
space was there, we shouldn't remove it.

> I'd just add the whitespace as leading whitespace to the 
> TT_ImplicitStringLiteral.

I'm not sure I understand what you mean here. I think, we need to avoid 
touching both leading and trailing whitespace of unknown tokens. And as we 
always attach whitespace to the following token, we need to deal with the next 
token as well. Or you see an easier way? Then could you explain it a bit more 
thoroughly?

================
Comment at: lib/Format/TokenAnnotator.cpp:575
@@ +574,3 @@
+      Current.Type = TT_ImplicitStringLiteral;
+    if (Current.Type != TT_Unknown)
+      return;
----------------
Daniel Jasper wrote:
> Alexander Kornienko wrote:
> > Daniel Jasper wrote:
> > > This change is not a no-op!! Move to line 608 and add:
> > > 
> > > // FIXME: Add tests that break if this gets moved up.
> > > 
> > > if no tests are failing now...
> > It wasn't meant to be a no-op. This check is here to preserve Current.Type 
> > if it is already set to TT_ImplicitStringLiteral. If moving the check here 
> > can really break something, we certainly need tests for this. I've added 
> > the FIXME line and changed the check to a more specific one.
> I think you are actually not achieving what you want to achieve. The code 
> between here and line 614 does not even touch Current's Type. It only changes 
> the type of previous tokens (which might still be bad). However, this check 
> in its current form should not change the behavior. If it does, what is the 
> test case?
I don't completely understand, why the "if (Current.Type == TT_Unknown)" check 
was where it was before, but it's back there now after your comment (and I've 
added the FIXME note you asked me to add), so the semantic of this part should 
be unchanged. What has changed, is the new code that handles the case of 
tok::unknown in the beginning of the function. As this case is quite distinct, 
I don't think it affects anything else. The test for this code is in place, and 
the code really does what it's meant to do (as in "removing the code breaks the 
test").

And I can change the check from the "early exit" style back to what it was 
before, if you prefer it to be that way, btw.


http://llvm-reviews.chandlerc.com/D1858
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Don't remove unknown tokens that are not whitespace (PR17215).

Reply via email to