[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

Donát Nagy via cfe-commits Tue, 01 Oct 2024 07:54:37 -0700

NagyDonat wrote:

> We can always have a map, mapping ForStmts to finished iterations. Basically, 
> from the ErrorNode, we could walk backwards (as usual in a visitor like 
> that), and check if the current ProgramPoint is PostStmt. We would then grab 
> the wrapped Stmt to see if it's the terminator Stmt of a For-like statement. 
> If so, them bump the counter in the map of that loop. Once we finished the 
> visitation, in the `finalizeVisitor` we could see if any of the loops had a 
> wrong counter, and mark the report invalid.
> 
> If we want to see if we did a state-split when we left the loop we would have 
> 2 options:
> 
>     1. Take the State and the condition expression and query by using 
> `assume` as usual. We should get the same answer as during the egraph 
> construction.
> 
>     2. Have a look at the **original** exploded graph and check if we have 
> only have a single successor node.


That's a clever solution, but I'm still not convinced that overall it'd be 
easier to understand than what I wrote.

Note that in addition to checking for state split when the analyzer _leaves_ 
the loop, the heuristics also need to check for state split when the analyzer 
_remains_ in the loop, so for each loop we would need to manage a list like 
"left the loop with[out] assumption, before that remained in the loop with[out] 
assumption, before that remained in the loop with[out] assumption, ..."

This already awkward code would become even more complex if we determined that 
we need accurate iteration counts in cases when a loop is reached several times 
(from different function calls or from different iterations of an outer loop). 
Implementing this would be already difficult in my eager model (see the last 
few paragraphs of this inline comment: 
https://github.com/llvm/llvm-project/pull/109804#discussion_r1782588587 ), and 
the backward-iteration logic would make this even more complicated.

> Note that the exploded graph that the VisitNode traverses is stripped, thus 
> it's a single path of nodes, without ambiguities of successors or predecessor 
> nodes.

Thanks for alerting me :sweat_smile: I didn't have the opportunity to run into 
this trap (yet), and now I'll try to remember it.

> > In general, I feel that `BugReporterVisitor` is a complicated piece of 
> > artillery [...]
> 
> Yes, but I also wish we could achieve this without making the internals more 
> complicated. Having a separated component doing this has a lot of value.
>
> > Moreover, if this suppression works well, then perhaps we could generalize 
> > it and add a config option which enables it for all checkers. If we want to 
> > do this, it will be nice to have this logic in `processBranch()` where we 
> > can immediately discard the "weak" transitions (instead of just marking 
> > them).
> 
> A BugReportVisitor is just as generic to me.

I see that there is value in keeping the internals as simple as possible, but I 
also feel that hiding pieces of the engine logic in scattered visitors also 
makes the code harder to understand. (Especially since the visitor will 
activate at a very different stage of the analysis, so its influence is 
completely invisible at the engine logic where it "belongs".)

I think this is a "simple is better than complex; complex is better than 
complicated" case (to quote the zen of python): we should try to keep the 
engine as simple as possible, but we should not hide parts of the necessary 
complexity with complicated tricks 

If we end up with generalizing this and the "weak" transitions will be 
recognized by many checkers (or we discard them altogether), then it would be 
important to have this logic in its "natural" place in the engine (instead of 
being hidden away in a visitor).

On the other hand, if this heuristic remains limited to ArrayBoundV2, then I 
agree that architecturally it would be better to place this logic in a visitor 
in ArrayBoundCheckerV2.cpp -- but even in this case I feel that this concern is 
counterbalanced by the fact that the visitor code would be significantly more 
awkward and complex than the direct eager implementation.

Moreover, if we end up with enabling this heuristic for _all_ the checkers then 
we can outright discard the weak transitions, which could provide a significant 
performance improvement compared to both the visitor-based alternative 
implementation and even the status quo.

If we eagerly prune the exploded graph by not following the weak loop 
assumptions, then we could avoid visiting lots of execution paths, while
- the visitor-based approach would just discard results from these paths after 
wasting time on analyzing them;
- the status quo displays results found on these paths, but I suspect that the 
results coming after assuming 0 or 3+ loop iterations are redundant and very 
similar to the results after 1 or 2 (with a few differences where I'd prefer to 
see the findings from 1 or 2 iterations).

 On complex-real-world code I think it's fair to assume that the average bug 
path contains at least one loop where the analyzer can't evaluate the condition 
(currently these are displayed as skipped, because that's the shortest -- but 
sometimes visibly unrealistic -- path) and this would imply that (very roughly) 
half of the execution paths are affected by weak loop assumptions, and I don't 
think that we should discard _so_ many paths.
 
> If you prefer, I could also play with this concept. I'll probably have maybe 
> 1 day this week where I could work on this.

Thanks for the offer, but I right now think I would prefer to keep the 
direct/eager approach. However, even the visitor-based approach would be better 
than nothing, so if you say that the engine modifications are unacceptable, 
then we could switch to a visitor. (However, I can probably write the visitor 
on my own following the approach that you suggested. I'm not incapable of 
writing bug reporter visitors, I just think that they're ugly.)

I also thought about finding a third way with a `check::BranchCondition` 
callback, but unfortunately that callback appears to be a terminal stage victim 
of code decay. (I'll probably write a github ticket about this in the upcoming 
days.)

https://github.com/llvm/llvm-project/pull/109804
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

Reply via email to