On Fri, Jun 6, 2025 at 5:44 PM Dave Hansen <[email protected]> wrote: > > On 6/6/25 02:17, Jiri Slaby wrote: > > Given this is the second time I hit a bug with this, perhaps introduce > > an EXPERIMENTAL CONFIG option, so that random users can simply disable > > it if an issue occurs? Without the need of patching random userspace and > > changing random kernel headers? > > What about something like the attached (untested) patch? That should at > least get folks back to the old, universal working behavior even when > using new compilers.
IMO the commit message is unnecessarily overly dramatic. The "nasty bugs" were in fact: - unfortunate mix of clang < 19 and new gcc-14 [1], fixed by robustifying the detection of typeof_unqual [1] https://lore.kernel.org/lkml/ca+g9fyup2bhndvjwfmm6+8y8uyk74qcw-2hsfyrzjdfiq5d...@mail.gmail.com/ - sparse doesn't understand new keyword, patch at [2], but sparse is effectively unmaintained so a workaround is in place [2] https://lore.kernel.org/linux-sparse/[email protected]/ - genksyms didn't understand the new keyword, fixed by [3]. [3] https://lore.kernel.org/lkml/174461594538.31282.5752735096854392083.tip-bot2@tip-bot2/ - a performance regression, again due to the unfortunate usage of old gcc-13 [4]. The new gcc-14 would break compilation due to the missing __percpu qualifier. This is one of the examples, where new checks would prevent the issue during the development. Fixed with the help of gcc-14. [4] https://lore.kernel.org/all/CAADnVQ+iFBxauKq99=-Xk+BdG+Lv=xgvwi1dc4fpg0utmxj...@mail.gmail.com/ - the issue in this thread, already fixed/worked around. Looking at the fix, I don't think gcc is at fault, but I speculate that there could be some invalid assumption about dwarf representation of variables in non-default address space at play. I'll look at this one in some more detail. Please also note that besides the above issues, the GCC type system and related checks around named address spaces was rock solid; there were *zero* bugs regarding __percpu variables, and the referred patch moves *all of them* to __seg_gs named address space. The patch builds off an equally stable and now well proven GCC named address space support, so in my opinion, it *is* ready for prime time. As demonstrated in the above list of issues, it was *never* the compiler at fault. Let me reiterate what the patch brings to the table. It prevents invalid references of per cpu variables to non-percpu locations. One missing percpu dereference can have disastrous consequences (all CPUs will access data in the shared space). Currently, the safety builds on checking sparse logs, but sparse errors don't break the build. With new checks in place, *every* invalid access is detected and breaks the build with some 50 lines of errors. Hiding these checks behind the CONFIG_EXPERT option breaks the intention of the patch. IMO, it should be always enabled to avoid errors, mentioned in the previous paragraph, already during the development time. I'm much more inclined to James' proposal. Maybe we can disable these checks in v6.15 stable series, but leave them in v6.16? This would leave a couple of months for distributions to update libbpf. Thanks, Uros.
