JustinStitt wrote:
With all the review it is clear that we have to choose some new semantics for
OBTs. It is also important that OBTs are useful in their design and purpose for
many projects. OBTs should provide type-level overflow behavior handling. With
this goal in mind, there's two customers: 1) large existing codebases and 2)
new projects.
To help all kinds of projects I met with @kees again and we talked about
separating some behaviors into separate modes. We identified two modes that
would give the best path forward for the Linux kernel, other existing projects,
and new projects. We were thinking of a "compliant" mode and a "strict" mode.
Name bike-shedding welcome. :)
@ojhunt We see your concerns with a strict mode and want to address them by
ensuring code compatibility between all modes. Further below are some examples
and design principles for the modes. First, let's make the distinction clear:
The "compliant" mode would use traditional C promotion rules with the exception
that the OBT qualifier is persisted through implicit casts. This allows us to
get truncation signal during storage of less-than-int arithmetic results and
overflow signal on other results. This mode would be the introductory mode for
large projects that cannot make the direct jump to strict mode. @kees has shown
that this compliant mode would still provide useful signal in the Linux kernel,
where truncation accounts for a large percentage of the integer overflow flaws.
An example of compliant mode:
```c
typedef unsigned short __ob_trap tu16;
tu16 a = 65535; // USHORT_MAX
int b = 1;
tu16 c = a + b; // a and b promoted to "__ob_trap int", traps on truncated
assignment
// a+b is implicitly promoted to __ob_trap int
// result of a=b is 65536
// 65536 doesn't fit within tu16's storage space, so we can trap on the
assignment
```
The "strict" mode would require matching bitwidths and obt kinds which results
in no ambiguity and provides homogeneity of arithmetic results to gain full
visibility into potential overflows. For example, this matches the semantics of
Rust. Nothing implicit happens in this mode. To make this mode most useful, it
would need that casts to strict obt kinds get instrumented for overflow (more
on this below).
Same example with strict mode:
```c
typedef unsigned short __ob_strict_trap tu16;
tu16 a = 65535; // USHORT_MAX
int b = some();
tu16 c = a + b; // error: a and b have different types
solution 1:
// Make sure everything has the same type
tu16 a = 65535;
tu16 b = some(); // some() must also return tu16.
tu16 c = a + b; // OK, everything is the same type.
solution 2:
// add explicit casts
tu16 a = 65535;
int b = some();
tu16 c = a + (tu16)b; // OK, everything is the same type...
// ... but we must avoid potential silent dataloss during c-style cast.
```
To make the "strict" mode usable, it would also necessitate the need for
c-style casts to be instrumentable, otherwise we risk silent truncations. This
cast instrumentation would be used to get full signal across arithmetic
expressions being converted from compliant to strict mode.
Take a look at this toy example case which shows the usefulness of strict obts,
which catches the other major class of integer overflow the Linux kernel wants
to catch:
```c
// vanilla C with -fwrapv (i.e. Linux kernel today)
int x = INT_MAX;
int y = (INT_MAX-99);
u8 sz = x * y; // sz is 100 due to wrap-around, no truncation
... = malloc(sz); // buggy malloc of 100 bytes
// compliant trap mode (would not catch this kind of overflow)
typedef unsigned char __ob_trap tu8;
int x = INT_MAX;
int y = (INT_MAX-99);
tu8 sz = x * y; // sz is 100 due to wrap-around, no truncation so no trap
... = malloc(sz); // buggy malloc 100 bytes
// strict trap mode
typedef unsigned char __ob_strict_trap tu8;
int x = INT_MAX;
int y = (INT_MAX-99);
tu8 sz = (tu8)x * (tu8)y; // strict mode forces same types, so add casts
// The casts will add instrumentation to catch data loss at runtime.
... = malloc(sz);
// When using strict obts the ultimate goal is to have code changed to all
matching types instead of littering casts everywhere
// So, imagine a more opaque example with some() and other()
tu8 x = some(); // refactor these apis to use tu8
tu8 y = other();
tu8 sz = x * y; // now all types are the same with no casts locally.
... = malloc(sz);
```
Refactoring to use __ob_strict_trap is still safe/stable when obts are
unsupported, because the casts don't make anything worse. If, instead, we added
optional bit-width/kind mismatch warnings to the "compliant" mode, we run the
risk of bad casts (e.g. just "u8" above) being added, which would silently hide
the overflow and silence the bit-width warning.
It's also possible that we just need a stand-alone "strict" type qualifier that
requires annotated types cannot participate in any implicit promotions. And
this could then just be applied to the existing obts (or any other types).
So the process for an existing project would be to migrate some types (e.g.
size_t) via the compliant obts, and in other places (e.g. new types, new code,
APIs, etc), use the strict obts. This should provide the greatest flexibility
without compromising on coverage. For example:
```c
#if __has_attribute(overflow_behavior)
// "compliant" obt for an "existing" type
typedef unsigned long __ob_trap size_t;
// "strict" obt for a "new" type
typedef unsigned char __strict __ob_trap tu8;
#else
typedef unsinged long size_t;
typedef unsigned char tu8;
#endif
```
It is important that strict mode doesn't carry different conversion semantics.
It may only enforce stricter type rules requring explicit casts or type changes.
```c
typedef unsigned char __strict __ob_trap tu8;
extern void some(int);
void foo(int x, int y) {
// must get 'x' and 'y' to be of type 'tu8'
// old compilers will build this code just fine (and trap on truncation)
// obt-enabled compilers will fail to build as there are mismatching types
tu8 a = x * y;
some(a);
}
```
... Now convert the code to build with 'strict' mode.
```c
void foo(int x, int y) {
// old compiler will still build this just fine
// obt-enabled compilers will now build (and instrument explicit casts for
data loss)
// no difference in result between compilers
tu8 a = (tu8)x * (tu8)y;
some(a);
}
```
https://github.com/llvm/llvm-project/pull/148914
_______________________________________________
lldb-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits