================
@@ -0,0 +1,362 @@
+==================================================
+``-fbounds-safety``: Enforcing bounds safety for C
+==================================================
+
+.. contents::
+   :local:
+
+Overview
+========
+
+``-fbounds-safety`` is a C extension to enforce bounds safety to prevent 
out-of-bounds (OOB) memory accesses, which remain a major source of security 
vulnerabilities in C. ``-fbounds-safety`` aims to eliminate this class of bugs 
by turning OOB accesses into deterministic traps.
+
+The ``-fbounds-safety`` extension offers bounds annotations that programmers 
can use to attach bounds to pointers. For example, programmers can add the 
``__counted_by(N)`` annotation to parameter ``ptr``, indicating that the 
pointer has ``N`` valid elements:
+
+.. code-block:: c
+
+   void foo(int *__counted_by(N) ptr, size_t N);
+
+Using this bounds information, the compiler inserts bounds checks on every 
pointer dereference, ensuring that the program does not access memory outside 
the specified bounds. The compiler requires programmers to provide enough 
bounds information so that the accesses can be checked at either run time or 
compile time — and it rejects code if it cannot.
+
+The most important contribution of ``-fbounds-safety`` is how it reduces the 
programmer’s annotation burden by reconciling bounds annotations at ABI 
boundaries with the use of implicit wide pointers (a.k.a. “fat” pointers) that 
carry bounds information on local variables without the need for annotations. 
We designed this model so that it preserves ABI compatibility with C while 
minimizing adoption effort.
+
+The ``-fbounds-safety`` extension has been adopted on millions of lines of 
production C code and proven to work in a consumer operating system setting. 
The extension was designed to enable incremental adoption — a key requirement 
in real-world settings where modifying an entire project and its dependencies 
all at once is often not possible. It also addresses multiple of other 
practical challenges that have made existing approaches to safer C dialects 
difficult to adopt, offering these properties that make it widely adoptable in 
practice:
+
+* It is designed to preserve the Application Binary Interface (ABI).
+* It interoperates well with plain C code.
+* It can be adopted partially and incrementally while still providing safety 
benefits.
+* It is syntactically and semantically compatible with C.
+* Consequently, source code that adopts the extension can continue to be 
compiled by toolchains that do not support the extension.
+* It has a relatively low adoption cost.
+* It can be implemented on top of Clang.
+
+This document discusses the key designs of ``-fbounds-safety``. The document 
is subject to be actively updated with a more detailed specification. The 
implementation plan can be found in `Implementation plans for -fbounds-safety 
<BoundsSafetyImplPlans.rst>`_.
+
+Programming Model
+=================
+
+Overview
+--------
+
+``-fbounds-safety`` ensures that pointers are not used to access memory beyond 
their bounds by performing bounds checking. If a bounds check fails, the 
program will deterministically trap before out-of-bounds memory is accessed.
+
+In our model, every pointer has an explicit or implicit bounds attribute that 
determines its bounds and ensures guaranteed bounds checking. Consider the 
example below where the ``__counted_by(count)`` annotation indicates that 
parameter ``p`` points to a buffer of integers containing ``count`` elements. 
An off-by-one error is present in the loop condition, leading to ``p[i]`` being 
out-of-bounds access during the loop’s final iteration. The compiler inserts a 
bounds check before ``p`` is dereferenced to ensure that the access remains 
within the specified bounds.
+
+.. code-block:: c
+
+   void fill_array_with_indices(int *__counted_by(count) p, unsigned count) {
+   // off-by-one error (i < count)
+      for (unsigned i = 0; i <= count; ++i) {
+         // bounds check inserted:
+         //   if (i >= count) trap();
+         p[i] = i;
+      }
+   }
+
+A bounds annotation defines an invariant for the pointer type, and the model 
ensures that this invariant remains true. In the example below, pointer ``p`` 
annotated with ``__counted_by(count)`` must always point to a memory buffer 
containing at least ``count`` elements of the pointee type. Increasing the 
value of ``count``, like in the example below, would violate this invariant and 
permit out-of-bounds access to the pointer. To avoid this, the compiler emits 
either a compile-time error or a run-time trap. Section `Maintaining 
correctness of bounds annotations`_ provides more details about the programming 
model.
+
+.. code-block:: c
+
+   void foo(int *__counted_by(count) p, size_t count) {
+      count++; // violates the invariant of __counted_by
+   }
+
+The requirement to annotate all pointers with explicit bounds information 
could present a significant adoption burden. To tackle this issue, the model 
incorporates the concept of a “wide pointer” (a.k.a. fat pointer) – a larger 
pointer that carries bounds information alongside the pointer value. Utilizing 
wide pointers can potentially reduce the adoption burden, as it contains bounds 
information internally and eliminates the need for explicit bounds annotations. 
However, wide pointers differ from standard C pointers in their data layout, 
which may result in incompatibilities with the application binary interface 
(ABI). Breaking the ABI complicates interoperability with external code that 
has not adopted the same programming model.
+
+``-fbounds-safety`` harmonizes the wide pointer and the bounds annotation 
approaches to reduce the adoption burden while maintaining the ABI. In this 
model, local variables of pointer type are implicitly treated as wide pointers, 
allowing them to carry bounds information without requiring explicit bounds 
annotations. This approach does not impact the ABI, as local variables are 
hidden from the ABI. Pointers associated with any other variables are treated 
as single object pointers (i.e., ``__single``), ensuring that they always have 
the tightest bounds by default and offering a strong bounds safety guarantee.
+
+By implementing default bounds annotations based on ABI visibility, a 
considerable portion of C code can operate without modifications within this 
programming model, reducing the adoption burden.
+
+The rest of the section will discuss individual bounds annotations and the 
programming model in more detail.
+
+Bounds annotations
+------------------
+
+Annotation for pointers to a single object
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The C language allows pointer arithmetic on arbitrary pointers and this has 
been a source of many bounds safety issues. In practice, many pointers are 
merely pointing to a single object and incrementing or decrementing such a 
pointer immediately makes the pointer go out-of-bounds. To prevent this 
unsafety, ``-fbounds-safety`` provides the annotation ``__single`` that causes 
pointer arithmetic on annotated pointers to be a compile time error.
+
+* ``__single`` : indicates that the pointer is either pointing to a single 
object or null. Hence, pointers with ``__single`` do not permit pointer 
arithmetic nor being subscripted with a non-zero index. Dereferencing a 
``__single`` pointer is allowed but it requires a null check. Upper and lower 
bounds checks are not required because the ``__single`` pointer should point to 
a valid object unless it’s null.
+
+We use ``__single`` as the default annotation for ABI-visible pointers. This 
gives strong security guarantees in that these pointers cannot be incremented 
or decremented unless they have an explicit, overriding bounds annotation that 
can be used to verify the safety of the operation. The compiler issues an error 
when a ``__single`` pointer is utilized for pointer arithmetic or array access, 
as these operations would immediately cause the pointer to exceed its bounds. 
Consequently, this prompts programmers to provide sufficient bounds information 
to pointers. In the following example, the pointer on parameter p is 
single-by-default, and is employed for array access. As a result, the compiler 
generates an error suggesting to add ``__counted_by`` to the pointer.
+
+.. code-block:: c
+
+   void fill_array_with_indices(int *p, unsigned count) {
+      for (unsigned i = 0; i < count; ++i) {
+         p[i] = i; // error
+      }
+   }
+
+
+External bounds annotations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+“External” bounds annotations provide a way to express a relationship between 
a pointer variable and another variable (or expression) containing the bounds 
information of the pointer. In the following example, ``__counted_by(count)`` 
annotation expresses the bounds of parameter p using another parameter count. 
This model works naturally with many C interfaces and structs because the 
bounds of a pointer is often available adjacent to the pointer itself, e.g., at 
another parameter of the same function prototype, or at another field of the 
same struct declaration.
+
+.. code-block:: c
+
+   void fill_array_with_indices(int *__counted_by(count) p, size_t count) {
+      // off-by-one error
+      for (size_t i = 0; i <= count; ++i)
+         p[i] = i;
+   }
+
+External bounds annotations include ``__counted_by``, ``__sized_by``, and 
``__ended_by``. These annotations do not change the pointer representation, 
meaning they do not have ABI implications.
+
+* ``__counted_by(N)`` : The pointer points to memory that contains ``N`` 
elements of pointee type. ``N`` is an expression of integer type which can be a 
simple reference to declaration, a constant including calls to constant 
functions, or an arithmetic expression that does not have side effect. The 
annotation cannot apply to pointers to incomplete types or types without size 
such as ``void *``.
----------------
rapidsna wrote:

> a constant including calls to constant functions -- only in C++, right?

And this document doesn't discuss the C++ specific model.

This includes a call to a function with `__attribute__((const))` when the call 
arguments are all constants. If the function definition doesn't respect 
`__attribute__((const))`, how  the count is evaluated will be an undefined 
behavior. I'll clarify the text.

https://github.com/llvm/llvm-project/pull/70749
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to