https://github.com/NagyDonat commented:

Overall LGTM, my only remark is about comment clarity.

> The root cause is in convertOffsetsFromSvalToUnsigneds in RegionStore, which 
> returned UndefinedVal for any element index exceeding its sub-array extent, 
> conflating pointer arithmetic legality with memory initializedness.

I completely agree with this diagnosis.

> Whether cross-subobject pointer arithmetic constitutes undefined behavior per 
> C/C++ standards is a separate concern for individual checkers to diagnose. No 
> existing checker flags sub-array boundary crossing as UB, verified both 
> before and after https://github.com/llvm/llvm-project/pull/198346.

In fact I'm planning to change the behavior of `security.ArrayBound` to let it 
report situations like referencing `arr[1][5]` in an array declared as `int 
arr[4][3]` -- because it _is_ technically UB and the array bound checker should 
be able to report this.

However if the array bound checker does not report it directly (because it is 
disabled or the situation is more complex and some heuristic silences the 
warning) then the RegionStore should not report it indirectly (by returning an 
`Undefined` value which can trigger reports from various "use of undefined 
value" checkers).

--------------------

More generally, I think `UndefinedVal` should be reserved for the contents of 
memory that should not be read (e.g uninitialized local, area returned by 
`malloc`, perhaps a moved-from object etc.) and we should _never_ use it as the 
result of operations that trigger undefined behavior (e.g. out-of-bounds 
access, bad bitwise shift, zero division).

When the analyzer thinks that an undefined operation happens (on a certain 
execution path), it should immediately produce a bug report (which is focused 
on the root cause, can be cleanly suppressed or disabled etc.) instead of 
returning `UndefinedVal` (which will probably trigger some use-of-undefined 
checker later, and can be hard to understand or suppress, but may as well go 
unnoticed).

Returning `UndefinedVal` as a "second line of defense" is a very 
counter-productive approach: if the execution path wasn't terminated by an 
error outright, then the analyzer should admit that it doesn't understand the 
situation and use `UnknownVal`.

https://github.com/llvm/llvm-project/pull/200044
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to