haoNoQ wrote:

> Looking at the check code in general, I have some concerns whether this check 
> should be implemented as a Clang Static Analyzer (CSA) check instead of a 
> clang-tidy check. My main reason is that we now have a lot of manual node 
> traversing, even an `ExecutionVisitor`. This whole _traversing part_ must be 
> already implemented in the core engine of the CSA. Even in [Getting 
> Involved](https://clang.llvm.org/extra/clang-tidy/Contributing.html#choosing-the-right-place-for-your-check)
>  guide it mentions that CSA should be used for control flow analysis, and we 
> clearly have it here.
> 
> I kindly ask for an opinion of clang-tidy maintainers on this matter, 
> @HerrCai0907 @carlosgalvezp @PiotrZSL.
> 
> Also, kindly ask CSA maintainers to take a brief look at this checker. From 
> your point of view, will it be easier and less code consuming to implement 
> such checker as a part of CSA? @steakhal @NagyDonat @haoNoQ.

Oh good question. Generally it's a bit more nuanced than that. The static 
analyzer is doing a lot more than AST traversal; it's doing so-called "path 
sensitive" analysis which tries to enumerate different *execution paths* in the 
program. Which is very good at finding realistic execution paths on which 
something "bad" *may* happen, and not suitable for all other kinds of analysis.

Yes, I think this checker could be implemented as a path-sensitive checker in 
the static analyzer. It does indeed search for "bad" things that *may* happen, 
which is exactly what the path-sensitive engine is good at. Path-sensitive 
analysis is more precise, it may help you naturally eliminate certain classes 
of false positives that would be extremely difficult to eliminate manually. 
However, it's also significantly more expensive, so it'll run slower for the 
users who aren't already using path-other sensitive checks (the performance 
cost will be negligible for users who are already enabling at least one other 
path-sensitive checker). Additionally, because of being more expensive and 
explosive in complexity, it may suffer from more false negatives, simply 
because it's running out of the analysis "budget".

This won't necessarily eliminate a lot of code from the checker. Path-sensitive 
checkers often end up more simple and elegant. You'll still need to identify 
the places where the execution may be unsequenced, we won't do that for you. 
But then you'll put markers in the path-sensitive state when you're inside 
those statements, and focus on responding to global variable accesses while 
those markers are present. We'll do the rest of the traversal for you. 
Path-sensitive analysis may also save you a lot of follow-up polish work (I 
haven't looked at how much of that you've already done).

----

Now, in practice, the best thing about path-sensitive analysis is that it 
naturally helps you avoid false positives of the following nature:
```c++
int globalVar;

int foo(bool shouldInc) {
  if (shouldInc)
    globalVar++;
  return globalVar;
}

void bar() {
  globalVar + foo(false); // should not warn
}
```

In this case the problematic mutation cannot happen in practice because the 
branch on which the increment happens is entirely unreachable in the context of 
`bar()`. The function `foo()` may increment the variable in other situations 
but it will never do that when called from inside `bar()` specifically.

These kinds of false positives may happen in a variety of different ways and 
you'll have a very hard time pattern-matching them all down on the AST level 
without a powerful principled solution that the path-sensitive engine 
implements. If you're using path-sensitive analysis, the false positive is 
eliminated automatically, by construction, without any effort on your part. You 
will simply never notice the increment of the global variable when you're 
analyzing `bar()` so you won't have to worry about reacting to it incorrectly. 
The analyzer simply knows that this execution path does not exist, so it'll 
never show it to you.

Whether you actually worry about these false positives, is entirely up to you 
though. It's a matter of your relationship with your users, a matter of 
understanding their demands, and also a matter of observing the behavior of the 
checker on real-world code. I cannot answer that question for you, you'll need 
to experiment.

We've had a similar situation in our "don't call virtual methods on a partially 
constructed object" checker 
(https://clang.llvm.org/docs/analyzer/checkers.html#cplusplus-purevirtualcall-c,
 
https://clang.llvm.org/docs/analyzer/checkers.html#optin-cplusplus-virtualcall).
 It turned out that in many cases the code was deliberately written in such a 
way that the constructor may call a function in turn may eventually call a 
virtual method but in reality the constructor passes a flag that makes sure the 
virtual method is never called from inside that function. So our initial 
attempt to avoid path-sensitive analysis has failed and we had to rewrite the 
checker as a path-sensitive checker.

----

Path-sensitive analysis also offers other bonuses. For example, it's able to 
track references and aliasing to some extent, so you're getting the following 
case covered "for free":

```c++
int globalVar;

int foo(int &var) {
  return var++;
}

void bar() {
  int &var = globalVar;
  globalVar + foo(var); // should warn
}
```

It's also able to resolve calls through function pointers and virtual calls in 
some cases:

```c++
int globalVar;

int foo() {
  return globalVar++;
}

void bar() {
  int (*func)() = foo;
  globalVar + func(); // should warn
}
```

But these are relatively minor and they probably won't ruin your checker if you 
don't support them.

----

So, overall, up to you. If you go for path-sensitive analysis, it would be a 
major rewrite, and not without downsides. You probably don't need to do it just 
for the purposes of elegance or code reuse. (Though I definitely won't stop 
you. I love elegance and code reuse.)

I think the main question you need to answer is whether the increased precision 
- in terms of both false positives and false negatives - is of value to you. 
The potential false negatives due to budget limitations are probably the second 
most important factor. But either way, it all depends on your experimental data 
and on your relationship with your future users.

https://github.com/llvm/llvm-project/pull/130421
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to