[clang] [analyzer][NFC] Introduce framework for checker families (PR #139256)

Donát Nagy via cfe-commits Tue, 20 May 2025 09:48:57 -0700

NagyDonat wrote:

> I'm happy as this PR looks right now, except for having that backward 
> compatibility overload for plugins.
> We can just move on with life and let them migrate once the next clang is 
> out. To me, the upgrade path looks straightforward and this isn't the only 
> API they will need to adjust anyway. They should brace for impact.


I first created commit 
https://github.com/llvm/llvm-project/pull/139256/commits/84eb5afa867f8f12db259053cb768919e0418ea8
 which eliminated the compatibility overloads and introduced `addMockChecker` 
to give a name to a pattern that appeared many times. However, as updated the 
doc comments, I realized that the fully general `addChecker` is needlessly 
verbose and hostile for plugin authors, so I created commit 
https://github.com/llvm/llvm-project/pull/139256/commits/04bfbf1551fa34602f54a85b45154bbaab4551ce
 which introduces other overloads that are more suitable for plugins.

Unfortunately these changes add ±100 changed lines, which makes this commit 
even larger -- but at least I got to fix some minor problems (e.g. obsolete 
docs and one case of the `mutable std::unique_ptr<BugType>` disease) in the 
plugin test code.

-------

> Could you give an example when it's misleading and not refers to a class 
> name? I figured the tablegen construct we have ensures this.

The second argument of the `CHECKER` macro is customarily called `CLASS`, but 
in fact it is a unique identifier of the particular checker fronted which is 
only used in the macro magic that assigns unique names to the registration 
methods.  In each checker family there is at most one checker whose `CLASS` 
matches the actual class name, because otherwise it would be impossible to 
define separate registration methods for them. (I also looked at the tablegen 
code in clang/utils/TableGen/ClangSACheckersEmitter.cpp and nothing tries to 
check that it is a valid class name.) As a concrete example, the `Checkers.inc` 
lines for the divzero checkers look like
```c++
CHECKER("core.DivideZero", DivZeroChecker, "Check for division by zero", 
"https://...";, false)
CHECKER("optin.taint.TaintedDiv", TaintedDivChecker, "Check for divisions ...", 
"https://...";, false)                                                           
                         
```
and you can see that `TaintedDivChecker` is not a real class name.

> We should be fine touching the plugin registration. This shouldn't be 
> disruptive.

Ok, noted -- I wasn't familiar with the plugin ecosystem.

> WDYM by that this identifier is not stable?

Under this plan 6, the "identifier" = debug name of the checker family is the 
`CLASS` field of the sub-checker (=CheckerFrontend) which is enabled first (on 
that particular run).

As the `CLASS`es of the subcheckers are necessarily all different, enabling 
different sub-checkers always changes the debug name unless dependency 
relationships guarantee that the first subchecker is always enabled. However, 
in this "first subchecker is a common dependency of all other subcheckers" 
architecture the first subchecker does _nothing_, so it's just a very complex 
form boilerplate.

> We could have a tablegen property like ImplementedBy<"MyChecker"> for the few 
> checker families we currently have, and say that by default they are 
> implemented by CLASS.

As I described above, we would need these `ImplementedBy<"MyChecker">` markers 
in *all* checker families for *almost all* subcheckers (because only one 
subchecker can own the real `CLASS` name) -- which would be more boilerplate 
than just declaring one `StringRef getDebugName() override {return 
"SomeCheckerFamily";}` per checker family.

--------

**Based on these, I'm still pleading you to reconsider your rejection of option 
1, because it is still the most clear solution to ensure that `getDebugName()` 
consistently returns the class name.**

If the class name of the `CheckerFamily` is not stringified automatically via 
option 2 (which you also reject), then it must be written down *somewhere*. 
Directly placing it at its single point of use (`getDebugName()`) is much 
simpler than hiding it away in the highly magical `Checkers.td` definitions and 
then threading yet another "move this value from _here_ to _there_" sphagetti 
into the already tangled checker registration code. For a novice checker 
developer who is familiar with C++, the task "provide a trivial implementation 
for this virtual method" is much easier than "put yet another field into this 
project-specific magical metadata file"!

Moreover, placing the class name into the current CheckerFrontend-centric 
`Checkers.td` (in a stable way that doesn't depend on the set of subcheckers 
enabled by the user) is not an easy task:
- If each checker family has a "modeling checker" as its first subchecker which 
is a common dependency of all the "real" checkers, then the full modeling 
checker (two registration method + `Checker<>` specification in `Checkers.td` + 
dependency relationships) **is boilerplate** – lots of boilerplate – which 
could be avoided in all other implementations.
- If `Checker<>` specification in `Checker.td` includes an 
`ImplementedBy<"MyChecker">` to correct situations where `CLASS` differs from 
the real class name, then each checker family requires _(N-1)_ of these 
`ImplementedBy` tags where $N$ is the number of subcheckers. If $N > 2$, this 
is more verbose than defining just one method override.
- If we update `Checkers.td` to ensure that the `CLASS` field always contains 
the real class name, then we need to add an extra field which can be used to 
distinguish the names of the register methods. Adding one 
`RegisterNameTag<"Foo">` to _each_ checker part (possibly excluding one) is 
more verbose than defining just one method override for the whole family.

I'm also suggesting plan 1, because it is a good "placeholder" that leaves 
space for later improvements that could provide an actually _good_ way of 
injecting the checker class name. For example in the future we could move 
towards my "alternative" plan, where I suggested that we should
> Extend the format of `Checkers.td` to support three kinds of checker entities:
>  -    ModelingCheckers that have a debug name and can be activated as 
> dependencies (and probably they can also have dependencies);
>  -    ReportingCheckers that have a user-facing name, can be enabled or 
> disabled by the user, can have checker options and depend on one or more 
> ModelingCheckers;
>  -    Checkers which are just a shortcut for introducing a ModelingChecker 
> and a single ReportingChecker that depends on it.

In this system each checker family would correspond to a single lightweight 
`ModelingChecker` entry in `Checkers.td` from which we could easily query the 
class name.



https://github.com/llvm/llvm-project/pull/139256
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer][NFC] Introduce framework for checker families (PR #139256)

Reply via email to