[clang] [llvm] [NFC][analyzer] Extract bounds checking library (PR #202372)

Donát Nagy via cfe-commits Mon, 29 Jun 2026 09:48:32 -0700

================
@@ -0,0 +1,211 @@
+//===- BoundsChecking.h - Bounds checking related APIs ----------*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+//  This file defines APIs for performing a bounds check (i.e. comparing a
+//  symbolic Offset value to zero and a symbolic Extent value) and composing
+//  descriptions that explain its results.
+//
+//  This is intended as a replacement for `ProgramState::assumeInBound` to
+//  avoid its incorrect logic and compensate for deficiencies of other parts of
+//  the analyzer.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_BOUNDSCHECKING_H
+#define LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_BOUNDSCHECKING_H
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "llvm/Support/FormatVariadic.h"
+#include <optional>
+
+namespace clang {
+namespace ento {
+
+/// If `E` is an array subscript expression with a base that is "clean" (= not
+/// modified by pointer arithmetic = the beginning of a memory region), return
+/// it as a pointer to ArraySubscriptExpr; otherwise return nullptr.
+/// This helper function is used by two separate heuristics that are only valid
+/// in these "clean" cases.
+const ArraySubscriptExpr *getAsCleanArraySubscriptExpr(const Expr *E,
+                                                       const CheckerContext 
&C);
+
+class SizeUnit {
+  QualType AsType;
+  int64_t AsCharUnits;
+
+  SizeUnit() : AsType(), AsCharUnits(1) {}
+
+public:
+  SizeUnit(QualType T, const ASTContext &ACtx)
+      : AsType(T), AsCharUnits(ACtx.getTypeSizeInChars(T).getQuantity()) {
+    assert(!T.isNull());
+  }
+
+  static SizeUnit bytes() { return SizeUnit(); }
+
+  bool isBytes() const { return AsType.isNull(); }
+
+  /// If `E` is a "clean" array subscript expression, return the type of the
+  /// accessed element; otherwise return 'Bytes' because that's the best (or
+  /// least bad) option for the assumption messages that use this.
+  static SizeUnit forExpr(const Expr *E, const CheckerContext &C) {
+    const auto *ASE = getAsCleanArraySubscriptExpr(E, C);
+    if (!ASE)
+      return bytes();
+
+    return SizeUnit(ASE->getType(), C.getASTContext());
+  }
+
+  /// Return the element type that is "natural" for reporting out-of-bounds
+  /// memory access to 'Location'.
+  /// FIXME: It is unfortunate that this heuristic differs from the heuristic
+  /// used for reporting assumption (`SizeUnit::forExpr`).
+  static SizeUnit forSVal(SVal Location, const ASTContext &ACtx) {
+    const auto *EReg = Location.getAsRegion()->getAs<ElementRegion>();
+    assert(EReg && "this checker only handles element access");
+    return SizeUnit(EReg->getElementType(), ACtx);
+  }
+
+  int64_t asCharUnits() const { return AsCharUnits; }
+
+  std::string asExtentDesc() const {
+    if (isBytes())
+      return "the extent of";
+    return llvm::formatv("the number of '{0}' elements in",
+                         AsType.getAsString());
+  }
+
+  std::string asElementName() const {
+    if (isBytes())
+      return "byte";
+    return llvm::formatv("'{0}' element", AsType.getAsString());
+  }
+
+  std::string getOffsetName() const {
+    return isBytes() ? "byte offset" : "index";
+  }
+
+  /// Try to divide `Val1` and `Val2` (in place) by `this->asCharUnits()` and
+  /// return true if it can be performed without remainder. The values `Val1`
+  /// and `Val2` may be nullopt and in that case the corresponding division is
+  /// considered to be successful.
+  bool tryConvertValuesFromBytes(std::optional<int64_t> &Val1,
+                                 std::optional<int64_t> &Val2) const;
+};
+
+struct Messages {
+  std::string Short, Full;
+};
+
+enum class BadOffsetKind { Negative, Overflowing, Indeterminate };
+
+constexpr llvm::StringLiteral Adjectives[] = {"a negative", "an overflowing",
+                                              "a negative or overflowing"};
+inline StringRef asAdjective(BadOffsetKind Problem) {
+  return Adjectives[static_cast<int>(Problem)];
+}
+
+constexpr llvm::StringLiteral Prepositions[] = {"preceding", "after the end 
of",
+                                                "around"};
+inline StringRef asPreposition(BadOffsetKind Problem) {
+  return Prepositions[static_cast<int>(Problem)];
+}
+
+struct CheckFlags {
+  bool CheckUnderflow;
+  bool OffsetObviouslyNonnegative;
+  bool AcceptPastTheEnd;
+};
+
+class BoundsCheckResult {
+public:
+  enum class Kind { Underflow, Overflow, TaintBug, Paradox, Valid };
----------------
NagyDonat wrote:


> There might be an overlap with `BadOffsetKind`?

Yes, there is overlap: currently we have a mapping
```
  std::optional<BadOffsetKind> getBadOffsetKind() const {       
    switch (K) {                                                
    case Kind::Underflow:                                       
      return BadOffsetKind::Negative;                           
    case Kind::Overflow:                                        
      return assumedNonNegative() ? BadOffsetKind::Indeterminate
                                  : BadOffsetKind::Overflowing; 
    default:                                                    
      return std::nullopt;                                      
    }                                                           
  }
```
I will clean this up, probably by merging `Kind::Underflow` and 
`Kind::Overflow` into a single `Kind::Invalid` (and then `BadOffsetKind` will 
be subcategories of this `Invalid` kind).

This will cleanly separate the high-level result (Valid, Invalid, TaintBug, 
Paradox) and the subcategories of Invalid (Negative, Overflowing, 
Indeterminate) which require different messaging, but are equivalent otherwise. 

> Also, I wonder if `Taint` is orthogonal. E.g., if something is tainted and 
> only underflow is possible because upper bounds are checked but lower bounds 
> are not.

It is possible to subdivide `TaintBug` into the same three subcategories  
(`Negative`, `Overflowing`, `Indeterminate` – the `BadOffsetKind`s) that appear 
within `Invalid`.

_By the way, as of now `security.ArrayBound` cannot produce "index tainted, 
upper bounds checked, lower bounds unchecked" (i.e. `TaintBug` / `Negative`) 
reports. IIRC this limitation was inherited from `ArrayBoundV2`, but until now 
I did not notice. I will fix this in the near future in a separate non-NFC 
commit._

---------

> `Paradox` is not really a terminology we use in the static analyzer, I think. 
> Maybe we found a contradiction? But I would not call it paradox.

I feel that "contradiction" is too euphemistic -- it looks as if there is a 
disagreement between two equally hypothetical assumptions, while in fact we 
just found definite proof that the analyzer engine sprouts nonsense 
(confidently claims that an unsigned variable holds a negative value). [1]

If `Paradox` is too abstract and unusual, I could suggest `CorruptedState`, 
`AnalyzerBug` or similar names to describe this situation.

[1] I know that we cannot introduce proper cast modeling because the engine is 
not able to "see through" casts :disappointed:

https://github.com/llvm/llvm-project/pull/202372
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NFC][analyzer] Extract bounds checking library (PR #202372)

Reply via email to