https://github.com/NagyDonat commented:
Overall LGTM, my only remark is about comment clarity. > The root cause is in convertOffsetsFromSvalToUnsigneds in RegionStore, which > returned UndefinedVal for any element index exceeding its sub-array extent, > conflating pointer arithmetic legality with memory initializedness. I completely agree with this diagnosis. > Whether cross-subobject pointer arithmetic constitutes undefined behavior per > C/C++ standards is a separate concern for individual checkers to diagnose. No > existing checker flags sub-array boundary crossing as UB, verified both > before and after https://github.com/llvm/llvm-project/pull/198346. In fact I'm planning to change the behavior of `security.ArrayBound` to let it report situations like referencing `arr[1][5]` in an array declared as `int arr[4][3]` -- because it _is_ technically UB and the array bound checker should be able to report this. However if the array bound checker does not report it directly (because it is disabled or the situation is more complex and some heuristic silences the warning) then the RegionStore should not report it indirectly (by returning an `Undefined` value which can trigger reports from various "use of undefined value" checkers). -------------------- More generally, I think `UndefinedVal` should be reserved for the contents of memory that should not be read (e.g uninitialized local, area returned by `malloc`, perhaps a moved-from object etc.) and we should _never_ use it as the result of operations that trigger undefined behavior (e.g. out-of-bounds access, bad bitwise shift, zero division). When the analyzer thinks that an undefined operation happens (on a certain execution path), it should immediately produce a bug report (which is focused on the root cause, can be cleanly suppressed or disabled etc.) instead of returning `UndefinedVal` (which will probably trigger some use-of-undefined checker later, and can be hard to understand or suppress, but may as well go unnoticed). Returning `UndefinedVal` as a "second line of defense" is a very counter-productive approach: if the execution path wasn't terminated by an error outright, then the analyzer should admit that it doesn't understand the situation and use `UnknownVal`. https://github.com/llvm/llvm-project/pull/200044 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
