[PATCH] D59885: [Lex] Allow to consume tokens while preprocessing

Richard Smith - zygoloid via Phabricator via cfe-commits Wed, 08 May 2019 11:45:54 -0700

rsmith added inline comments.


================
Comment at: clang/lib/Lex/Preprocessor.cpp:870-900
+  TokenSource Source;
   do {
+    Source = TokenSource();
+
     switch (CurLexerKind) {
     case CLK_Lexer:
+      Source.InDirective = CurLexer->ParsingPreprocessorDirective;
----------------
ilya-biryukov wrote:
> rsmith wrote:
> > This is a lot of extra stuff to be doing in the main `Lex` loop. Adding one 
> > (perfectly-predicted) branch on `OnToken` seems like it should be fine, but 
> > this seems like a bit more added complexity than I'd prefer. I'd like some 
> > performance measurements of `-cc1 -Eonly` to see if this makes any real 
> > difference.
> Happy to do the measurements. Do you have a good collection of input files 
> for this in mind?
I think any large header should be fine for testing; I don't expect this to 
scale worse in the presence of lots of macro expansion or anything like that.


================
Comment at: clang/lib/Lex/Preprocessor.cpp:896
     case CLK_LexAfterModuleImport:
-      ReturnedToken = LexAfterModuleImport(Result);
+      Source.InDirective = true;
+
----------------
ilya-biryukov wrote:
> rsmith wrote:
> > This isn't a directive; these are normal tokens.
> Sorry if it's a silly question, just wanted to clarify I'm not missing 
> anything here.
> 
> These tokens are the name of the module,  "import-suffix" and the semicolon 
> that follows it?
> And they end up being used to build the AST for `ImportDecl`?
Right. LexAfterModuleImport is a pass-through lexing layer that takes special 
action on the ObjC `@import module.name;` and C++20 `import "header";` 
declarations, which are just normal phase 4 tokens.


================
Comment at: clang/lib/Lex/Preprocessor.cpp:956-957
   --LexLevel;
+  if (OnToken)
+    OnToken(Result, Source);
 }
----------------
ilya-biryukov wrote:
> ilya-biryukov wrote:
> > ilya-biryukov wrote:
> > > rsmith wrote:
> > > > This seems like it's going to be extremely hard to use. If you just 
> > > > want the expanded token stream, then removing all the other calls to 
> > > > `OnToken` and only calling it here when `LexLevel == 0` will give you 
> > > > that.
> > > I've tried `LexLevel == 0 && IsNewToken` and it **almost** works.
> > > The only trouble left is dealing with the delayed parsing. E.g. I'm 
> > > getting the callbacks for the same tokens multiple times in cases like 
> > > parsing of method bodies.
> > > 
> > > Any ideas on how to best tackle this?
> > > 
> > > The example is:
> > > ```
> > > struct X {
> > >   int method() { return 10; }
> > >   int a = 10;
> > > };
> > > ```
> > > 
> > > Seeing multiple callbacks for `{ return 10; }` and `= 10;`, want to see 
> > > only the first one.
> > It almost works now, see the update revision.
> > Almost no tokens get reported by the callback twice now, except the 
> > following case.
> > Input:
> > ```
> > struct X {
> >   int method() { return method(); }
> > };
> > ```
> > After the body of the class, we get a an extra callback for `)` token from 
> > one of the functions under `ParsedLexedMethodDef`.
> > This seems to be happening only for parenthesis.
> More precisely, this happens from a following sequence of calls:
> 
> ```
> Lex() {LexLevel=0}
> CachingLex() {LexLevel=1} // which does not have a token in `CachedTokens`, 
> so it recurses into ...
> Lex() {LexLevel=1}        // which results in a call to ...
> CurTokenLexer->Lex()
> ```
> 
> At this point `CurTokenLexer` returns a token of a pre-lexed method 
> definition, but the current implementation forgets that this was the case by 
> the time the token is processed at the the top of the callstack (`Lex() 
> {LexLevel=0}`).
> So the token ends up being reported by a callback.
Yeah, the parser messes with the token stream to support C++'s out-of-order 
parsing rule (among other things). Ignoring that here seems right. You might 
also want to make sure that calls to `EnterToken` are handled properly. The 
extra paren likely comes from such a call.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59885/new/

https://reviews.llvm.org/D59885



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D59885: [Lex] Allow to consume tokens while preprocessing

Reply via email to