[clang] [analyzer][CTU] Macro expansions for imported translation units (PR #176126)

Vladimir Vuksanovic via cfe-commits Wed, 04 Feb 2026 05:42:00 -0800

vvuksanovic wrote:

> Theoretically, this macro expansion context is a hack. The `MacroExpansion` 
> hierarchy should perfectly describe how macros are expanded - I just never 
> bothered learning how to do that, and slapping a token watcher worked well so 
> far. Ideally, we wouldn't need a token watcher, and we should just lean on 
> the `MacroExpansions` somehow.
> 
> Serialising this redundant information (the token sequence) into a PCH is not 
> elegant, because we start leaking our wacky implementation, and if possible, 
> I'd strive for not doing that.


As far as I can tell, that isn't redundant information. Macro expansions are 
not materialized anywhere in the lexer or preprocessor. They are implemented as 
a stack of `TokenLexer` objects each of which expands a single macro. Each 
lexer contains tokens of the macro they are expanding (from the macro 
definition), with the arguments replaced (for function-like macros), but nested 
macro expansions are handled recursively by pushing a new lexer to the stack 
(some other things like the ## operator are also handled separately). After 
each token is lexed/parsed, it is lost.
It might be possible to modify these lexers to keep the complete list of 
expanded tokens, and then return it through a new preprocessor callback, but 
that implementation would be even more invasive than the current one and it 
would be a waste since it would always be on, but almost never used.

> Unless it's a big ask, I'd ask you to explore what MacroExpansions encode. 
> Could we use that somehow to replace our token watcher implementation, and 
> just rely on that? If that is possible, we already have everything in the 
> PCH, so that part would get simplified. WDYT?

The `MacroExpansion` class doesn't actually contain much data, only the name 
(for built-in macros) or definition (for user macros) and the expanded source 
range. Unfortunately, the source range doesn't help in getting the expanded 
tokens, it just points to the macro identifier in the source code. The macro 
definition is also not useful since we don't know the state of the preprocessor 
at that point. There doesn't seem to be a way to reconstruct the expanded 
tokens/string from just this information, that is why I added the expanded 
string there.

I remembered that clangd also shows expansions when hovering over a macro. 
Their implementation is in `clang/lib/Tooling/Syntax/Tokens.cpp` and they do 
something similar to `MacroExpansionContext`. Using a token watcher and 
preprocessor callbacks they maintain a buffer containing spelling and expansion 
tokens and a mapping between expansion locations and the tokens in the 
expansion buffer.

I don't think there is a way to get macro expansions from the PCH without 
serializing it in some way ourselves.

https://github.com/llvm/llvm-project/pull/176126
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer][CTU] Macro expansions for imported translation units (PR #176126)

Reply via email to