haoNoQ wrote:
> Looking at the check code in general, I have some concerns whether this check
> should be implemented as a Clang Static Analyzer (CSA) check instead of a
> clang-tidy check. My main reason is that we now have a lot of manual node
> traversing, even an `ExecutionVisitor`. This whole _traversing part_ must be
> already implemented in the core engine of the CSA. Even in [Getting
> Involved](https://clang.llvm.org/extra/clang-tidy/Contributing.html#choosing-the-right-place-for-your-check)
> guide it mentions that CSA should be used for control flow analysis, and we
> clearly have it here.
>
> I kindly ask for an opinion of clang-tidy maintainers on this matter,
> @HerrCai0907 @carlosgalvezp @PiotrZSL.
>
> Also, kindly ask CSA maintainers to take a brief look at this checker. From
> your point of view, will it be easier and less code consuming to implement
> such checker as a part of CSA? @steakhal @NagyDonat @haoNoQ.
Oh good question. Generally it's a bit more nuanced than that. The static
analyzer is doing a lot more than AST traversal; it's doing so-called "path
sensitive" analysis which tries to enumerate different *execution paths* in the
program. Which is very good at finding realistic execution paths on which
something "bad" *may* happen, and not suitable for all other kinds of analysis.
Yes, I think this checker could be implemented as a path-sensitive checker in
the static analyzer. It does indeed search for "bad" things that *may* happen,
which is exactly what the path-sensitive engine is good at. Path-sensitive
analysis is more precise, it may help you naturally eliminate certain classes
of false positives that would be extremely difficult to eliminate manually.
However, it's also significantly more expensive, so it'll run slower for the
users who aren't already using path-other sensitive checks (the performance
cost will be negligible for users who are already enabling at least one other
path-sensitive checker). Additionally, because of being more expensive and
explosive in complexity, it may suffer from more false negatives, simply
because it's running out of the analysis "budget".
This won't necessarily eliminate a lot of code from the checker. Path-sensitive
checkers often end up more simple and elegant. You'll still need to identify
the places where the execution may be unsequenced, we won't do that for you.
But then you'll put markers in the path-sensitive state when you're inside
those statements, and focus on responding to global variable accesses while
those markers are present. We'll do the rest of the traversal for you.
Path-sensitive analysis may also save you a lot of follow-up polish work (I
haven't looked at how much of that you've already done).
----
Now, in practice, the best thing about path-sensitive analysis is that it
naturally helps you avoid false positives of the following nature:
```c++
int globalVar;
int foo(bool shouldInc) {
if (shouldInc)
globalVar++;
return globalVar;
}
void bar() {
globalVar + foo(false); // should not warn
}
```
In this case the problematic mutation cannot happen in practice because the
branch on which the increment happens is entirely unreachable in the context of
`bar()`. The function `foo()` may increment the variable in other situations
but it will never do that when called from inside `bar()` specifically.
These kinds of false positives may happen in a variety of different ways and
you'll have a very hard time pattern-matching them all down on the AST level
without a powerful principled solution that the path-sensitive engine
implements. If you're using path-sensitive analysis, the false positive is
eliminated automatically, by construction, without any effort on your part. You
will simply never notice the increment of the global variable when you're
analyzing `bar()` so you won't have to worry about reacting to it incorrectly.
The analyzer simply knows that this execution path does not exist, so it'll
never show it to you.
Whether you actually worry about these false positives, is entirely up to you
though. It's a matter of your relationship with your users, a matter of
understanding their demands, and also a matter of observing the behavior of the
checker on real-world code. I cannot answer that question for you, you'll need
to experiment.
We've had a similar situation in our "don't call virtual methods on a partially
constructed object" checker
(https://clang.llvm.org/docs/analyzer/checkers.html#cplusplus-purevirtualcall-c,
https://clang.llvm.org/docs/analyzer/checkers.html#optin-cplusplus-virtualcall).
It turned out that in many cases the code was deliberately written in such a
way that the constructor may call a function in turn may eventually call a
virtual method but in reality the constructor passes a flag that makes sure the
virtual method is never called from inside that function. So our initial
attempt to avoid path-sensitive analysis has failed and we had to rewrite the
checker as a path-sensitive checker.
----
Path-sensitive analysis also offers other bonuses. For example, it's able to
track references and aliasing to some extent, so you're getting the following
case covered "for free":
```c++
int globalVar;
int foo(int &var) {
return var++;
}
void bar() {
int &var = globalVar;
globalVar + foo(var); // should warn
}
```
It's also able to resolve calls through function pointers and virtual calls in
some cases:
```c++
int globalVar;
int foo() {
return globalVar++;
}
void bar() {
int (*func)() = foo;
globalVar + func(); // should warn
}
```
But these are relatively minor and they probably won't ruin your checker if you
don't support them.
----
So, overall, up to you. If you go for path-sensitive analysis, it would be a
major rewrite, and not without downsides. You probably don't need to do it just
for the purposes of elegance or code reuse. (Though I definitely won't stop
you. I love elegance and code reuse.)
I think the main question you need to answer is whether the increased precision
- in terms of both false positives and false negatives - is of value to you.
The potential false negatives due to budget limitations are probably the second
most important factor. But either way, it all depends on your experimental data
and on your relationship with your future users.
https://github.com/llvm/llvm-project/pull/130421
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits