[clang] Fix array bound checker false negative (PR #161723)

Donát Nagy via cfe-commits Sat, 18 Oct 2025 01:31:07 -0700

https://github.com/NagyDonat requested changes to this pull request.


Thanks for investing work into this checker, but unfortunatel – as @haoNoQ also 
explained –, the suggested changes are incompatible with the principle that the 
analyzer must assume the best about things that it doesn't understand (e.g. if 
it cannot prove that an index cannot be in bounds, then it must assume that the 
index is in bounds).

This principle is important because the analyzer only sees and understands 
small fragments of the code: the simulated execution starts from arbitrarily 
picked functions (because it cannot reach everything from `main()`) and it is 
common that the analyzer doesn't see what happens inside function calls 
(because the definition is unavailable or the recursion/work limits are 
exceeded).

With the changes that you propose, the analyzer would flood the user with heaps 
of false positive warnings, because even if the programmer does perfect bounds 
checking, there would be many situations where the analyzer _doesn't see_ these 
bounds checks and (with your aggressive logic) reports an error.

For example in code like
```c++
bool index_in_bounds(int, size_t); // defined in different TU
int foo(int idx) {
  if (index_in_bounds(idx, ARRAY_SIZE)) {
    return array[idx];
  }
  return -1;
}
```
the analyzer with your patch would produce a false positive because (by 
default) it cannot look at the definition of `index_in_bounds` and determine 
that it does proper bounds checking.

The selection of the entrypoint can also be problematic, for example in a 
situation like
```c++
double get_value(int idx) {
  if (idx < 0 || idx > NUM_VALUES)
     return 0.0
  return get_value_impl(idx);
//... in a different file
double get_value_impl(int idx) {
  return array[idx];
}
```
the analyzer can select `get_value_impl` as an entrypoint -- and with your 
patch it would be reported as out-of-bounds access even if `get_value_impl` is 
only called from `get_value` (and other functions that do proper bounds 
checking).

The taint handling is a special case, because the fact that the analyzer sees 
the taint implies that it was able to follow that value from a taint source 
(e.g. user input, that can produce arbitrary values) to the array indexing -- 
and didn't see proper bounds checking along the way. Despite this, the taint 
checkers are an `optin` feature of the analyzer because their noisiness may be 
too much for some users.

I can understand that some users would like to have more aggressive warnings 
about unchecked array access, and perhaps your changes could be included as an 
off-by-default option that can be enabled by those users. However, even in this 
case you should try out your code on several real-world projects (e.g. stable 
open source code) because I fear that it may produce so many false positives 
that it would be useless in practice.

https://github.com/llvm/llvm-project/pull/161723
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Fix array bound checker false negative (PR #161723)

Reply via email to