https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114217

--- Comment #11 from Akihiko Odaki <akihiko.odaki at daynix dot com> ---
(In reply to Jakub Jelinek from comment #9)
> (In reply to Akihiko Odaki from comment #8)
> > It would certainly workaround the issue, but it's only dirtier and brings no
> > benefit except suppressed UBSan errors. Why not allow
> > get_unaligned(&entry->offset) when UBSan does not complain about that hack?
> 
> The purpose of the sanitizer is to find undefined behavior, and it found one.
> If you don't want it to find undefined behavior, don't use it.
> 
> > > If you want something that will be valid even in C, don't pass struct
> > > dir_entry *entry
> > > argument, but void *entry instead, and use e.g.
> > > __get_unaligned_t(__typeof(((struct dir_entry *)0)->offset), ((char
> > > *)entry)+offsetof(struct dir_entry, offset)))
> > > You can surely hide that all under some macro.
> > 
> > The definition still involves UB for ((struct dir_entry *)0)->offset.
> > Perhaps __typeof() may be considered as an exception, but what if offsetof()
> > is defined as follows?
> > 
> > #define offsetof(T, x) ((uintptr_t)&(((T *)0)->x))
> 
> typeof nor sizeof operators don't evaluate their arguments unless the
> expressions have VLA type.  So, if there would be undefined behavior if they
> were evaluated, it doesn't matter because they aren't.  So, e.g.
> 
> void
> foo (int n, int *p)
> {
>   typedef int A[n];
>   A b[4];
>   typedef int C[10];
>   C d[4];
>   int i = 0;
>   int j = 0;
>   typeof (b[++i]) e;
>   typeof (d[++j]) f;
>   typeof (d[100][4]) g;
>   p[0] = i;
>   p[1] = j;
> }
> 
> doesn't invoke undefined behavior and sets p[0] to 1 and p[1] to 0, b[++i]
> had VLA type, so it got evaluated, the other 2 didn't, so ++j wasn't
> evaluated and the d[100]
> UB wasn't triggered, as if the program did if (n == 54) d[100][4]; and n at
> runtime was
> never 54.  __typeof is a GNU extension but behaves the same way.
> 
> offsetof is a standard C macro.  It is not defined in the standard as
> dereferencing NULL pointer, but as if there was an object of the T type and
> it computed the difference between the address of the x member of it and the
> start of the object.
> If offsetof is defined as
> #define offsetof(T, x) ((size_t)(uintptr_t)&(((T *)0)->x))
> by the implementation (and yes, e.g. GCC < 4 used to define it similarly),
> then obviously the compiler can't assume there is undefined behavior on such
> expressions,
> but it didn't at that point, there was no UBSAN, such expressions evaluated
> to a constant expression early before anything could have considered there
> might be UB and after it was just the resulting constant.
> If the implementation doesn't have some exception that considers expressions
> like
> &(((T *)0)->x) valid, then it can't use such offsetof definition and needs
> to use something else, like GCC uses __builtin_offsetof.  That said, GCC
> AFAIK still treats expressions like the above as offsetof-like (because
> there has been a lot of such code in the wild for decades), folds them to
> constant right away and doesn't complain about it with ubsan; clang does
> complain about it though and doesn't treat it that way.
> Anyway, if you use something like ((size_t)(uintptr_t)&(((T *)0)->x)), per C
> rules you are invoking UB, while if you use offsetof(T, x), it is well
> defined.

So there are two constructs invoking UBs but ignored by UBSan: 1) construction
of misaligned pointers that do not involve a struct member, and 2) pointer
arithmetic with a member of an invalid struct pointer as done by old
offsetof().

If UBSan catches get_unaligned(&entry->offset), why not catch those constructs
too? Perhaps 2) may not be relevant for UBSan since it's done at compile time,
but 1) is.

Whether 1) should be caught is a more complicated question. From a certain
viewpoint, it should be caught since it's UB, and future GCC and other
compilers may introduce incompatible optimization; there will be real bugs if
that happens.

On the other hand, the current GCC does not perform such an optimization, and
clang does not either at least when a packed struct is used or
__builtin_memcpy() is used and its argument is casted to void * or char *
(c.f., https://github.com/llvm/llvm-project/issues/83710). In that sense,
catching 1) brings only noises.

In any case, the answer to the question of whether 1) should be caught should
not be different from whether get_unaligned(&entry->offset) should be caught.

Reply via email to