On Fri, 7 Jul 2023, Richard Biener wrote:
I have implemented support for uninitialized memory in my translation
validator. But I am not sure how well this corresponds to the GIMPLE
semantics, so I have some questions...
My implementation tracks uninitialized bits. Use of uninitialized bits is
in general treated as UB (for example, `x + y` is UB if `x` or `y` has any
uninitialized bits), but there are a few exceptions:
* load/store: It is always OK to load/store values having uninitialized
bits.
* Phi nodes: Phi nodes propagates uninitialized bits.
definitely, I think plain copies would be OK as well
With "plain copies", do you mean treating it as it is always defined? That
would prevent optimizations such as transforming
int t;
if (1)
t = *p;
else
t = 0;
return t + 1;
to the equivalent of
return *p + 1;
because the latter is UB if *p is undefined, but the original is OK if phi
nodes always give us a fully defined value.
* selection: Instructions that are selecting an element (COND_EXPR,
VEC_PERM_EXPR, etc.) may select between values having uninitialized
bits, and the resulting value may have uninitialized bits. But the
condition/mask must not have uninitialized bits.
* Extraction: Instructions that are extracting bits (BIT_FIELD_REF etc.)
may have uninitialized bits in both the input/output.
* Insertion: Instructions that are constructing/inserting values
(COMPLEX_EXPR, etc.) may have uninitialized bits in both the
input/output.
Generally I think "moving" uninitialized bits in full or in part is OK.
Operating on them, including comparing them (with themselves)
is eventually asking for troubles.
Makes sense.
But I think we must restrict the definition of "move". For example, one
can see x * 2 as moving the uninitialized bits one step. And x + x and
x << 1 are the same as x * 2, so then they too would be defined? I would
prefer if all of them were UB if x contains uninitialized bits...
All other use of values having uninitialized bits are considered UB.
Does this behavior make sense?
The above seems to work fine so far, with one exception that can be seen
in gcc.c-torture/execute/pr108498-1.c. The test has an uninitialized bit
field
unsigned char c6:1, c7:1, c8:1, c9:3, c10:1;
which is written to as
x.c6 = 0;
x.c7 = 0;
x.c8 = 0;
x.c9 = 7;
The store merging pass changes this to
_71 = MEM <unsigned char> [(struct C *)&x + 8B];
_42 = _71 & 128;
_45 = _42 | 56;
and the translation validator is now complaining that the pass has
introduced UB that was not in the original IR (because the most
significant bit in _71 is uninitialized when passed to BIT_AND_EXPR).
I could solve this by allowing uninitialized bits in BIT_AND_EXPR and
BIT_OR_EXP, and propagating each bit according to
* `0 & uninit` is an initialized `0`
* `1 & uninit` is uninitialized
* `0 | uninit` is uninitialized
* `1 | uninit` is an initialized `1`
Is that the correct GIMPLE semantics?
I think the above "moves" the uninitialized MSB from memory to _45 so
that's OK.
Some "operations" like & 0 or & 1 give either defined values or
take the uninitialized bit unmodified (thus "move"). I think both
kinds are OK. Likewise + 0 or * 0/1 would be OK. What's not
OK is operations that use an (fully?) uninitialized value twice,
like x - x when x is uninitialized.
+ 0 and * 0/1 makes sense to me. One annoyance is that it will make my
tool slower as it must track undefined bits in more cases. But that is
not a valid reason for restricting the IR semantics...
I think we want that, as soon as the uninitialized value becomes
"part" of a then partly initialized value, it's value is "fixed".
With things like _Complex or vector the situation is a bit
of a grey area since we have ways to operate on elements.
Note that when we for example use a BIT_FIELD_REF to extract
the MSB from _42 above the value would be again fully undefined.
Do you mean that all operations on the values having "fixed" bits are
valid? I do not think that will work. The frozen value may be any value,
which mean that the program is nondeterministic. For example, if x has one
"fixed" uninitialized bit with the other bits 0, then code such as
if (x)
could "randomly" be either true or false, so the program will not really
have any defined semantic.
And if use of an extracted "fixed" uninitialized bit is UB, then the
following may be UB (if x or y has a "fixed" uninitialized least
significant bit)
bool b = (x & 1) != (y & 1)
while the equivalent
bool b = (x ^ y) & 1
is defined.
And things may be even more fun when arithmetic is "smearing" the fixed
uninitialized values over the variable -- it will then be impossible to
determine if the extracted bits are OK or not when extracted...
I hope this makes some sense, it's a tricky area.
Richard.