haoNoQ wrote: > Looking at the check code in general, I have some concerns whether this check > should be implemented as a Clang Static Analyzer (CSA) check instead of a > clang-tidy check. My main reason is that we now have a lot of manual node > traversing, even an `ExecutionVisitor`. This whole _traversing part_ must be > already implemented in the core engine of the CSA. Even in [Getting > Involved](https://clang.llvm.org/extra/clang-tidy/Contributing.html#choosing-the-right-place-for-your-check) > guide it mentions that CSA should be used for control flow analysis, and we > clearly have it here. > > I kindly ask for an opinion of clang-tidy maintainers on this matter, > @HerrCai0907 @carlosgalvezp @PiotrZSL. > > Also, kindly ask CSA maintainers to take a brief look at this checker. From > your point of view, will it be easier and less code consuming to implement > such checker as a part of CSA? @steakhal @NagyDonat @haoNoQ.
Oh good question. Generally it's a bit more nuanced than that. The static analyzer is doing a lot more than AST traversal; it's doing so-called "path sensitive" analysis which tries to enumerate different *execution paths* in the program. Which is very good at finding realistic execution paths on which something "bad" *may* happen, and not suitable for all other kinds of analysis. Yes, I think this checker could be implemented as a path-sensitive checker in the static analyzer. It does indeed search for "bad" things that *may* happen, which is exactly what the path-sensitive engine is good at. Path-sensitive analysis is more precise, it may help you naturally eliminate certain classes of false positives that would be extremely difficult to eliminate manually. However, it's also significantly more expensive, so it'll run slower for the users who aren't already using path-other sensitive checks (the performance cost will be negligible for users who are already enabling at least one other path-sensitive checker). Additionally, because of being more expensive and explosive in complexity, it may suffer from more false negatives, simply because it's running out of the analysis "budget". This won't necessarily eliminate a lot of code from the checker. Path-sensitive checkers often end up more simple and elegant. You'll still need to identify the places where the execution may be unsequenced, we won't do that for you. But then you'll put markers in the path-sensitive state when you're inside those statements, and focus on responding to global variable accesses while those markers are present. We'll do the rest of the traversal for you. Path-sensitive analysis may also save you a lot of follow-up polish work (I haven't looked at how much of that you've already done). ---- Now, in practice, the best thing about path-sensitive analysis is that it naturally helps you avoid false positives of the following nature: ```c++ int globalVar; int foo(bool shouldInc) { if (shouldInc) globalVar++; return globalVar; } void bar() { globalVar + foo(false); // should not warn } ``` In this case the problematic mutation cannot happen in practice because the branch on which the increment happens is entirely unreachable in the context of `bar()`. The function `foo()` may increment the variable in other situations but it will never do that when called from inside `bar()` specifically. These kinds of false positives may happen in a variety of different ways and you'll have a very hard time pattern-matching them all down on the AST level without a powerful principled solution that the path-sensitive engine implements. If you're using path-sensitive analysis, the false positive is eliminated automatically, by construction, without any effort on your part. You will simply never notice the increment of the global variable when you're analyzing `bar()` so you won't have to worry about reacting to it incorrectly. The analyzer simply knows that this execution path does not exist, so it'll never show it to you. Whether you actually worry about these false positives, is entirely up to you though. It's a matter of your relationship with your users, a matter of understanding their demands, and also a matter of observing the behavior of the checker on real-world code. I cannot answer that question for you, you'll need to experiment. We've had a similar situation in our "don't call virtual methods on a partially constructed object" checker (https://clang.llvm.org/docs/analyzer/checkers.html#cplusplus-purevirtualcall-c, https://clang.llvm.org/docs/analyzer/checkers.html#optin-cplusplus-virtualcall). It turned out that in many cases the code was deliberately written in such a way that the constructor may call a function in turn may eventually call a virtual method but in reality the constructor passes a flag that makes sure the virtual method is never called from inside that function. So our initial attempt to avoid path-sensitive analysis has failed and we had to rewrite the checker as a path-sensitive checker. ---- Path-sensitive analysis also offers other bonuses. For example, it's able to track references and aliasing to some extent, so you're getting the following case covered "for free": ```c++ int globalVar; int foo(int &var) { return var++; } void bar() { int &var = globalVar; globalVar + foo(var); // should warn } ``` It's also able to resolve calls through function pointers and virtual calls in some cases: ```c++ int globalVar; int foo() { return globalVar++; } void bar() { int (*func)() = foo; globalVar + func(); // should warn } ``` But these are relatively minor and they probably won't ruin your checker if you don't support them. ---- So, overall, up to you. If you go for path-sensitive analysis, it would be a major rewrite, and not without downsides. You probably don't need to do it just for the purposes of elegance or code reuse. (Though I definitely won't stop you. I love elegance and code reuse.) I think the main question you need to answer is whether the increased precision - in terms of both false positives and false negatives - is of value to you. The potential false negatives due to budget limitations are probably the second most important factor. But either way, it all depends on your experimental data and on your relationship with your future users. https://github.com/llvm/llvm-project/pull/130421 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits