On Sun, Mar 8, 2026 at 12:12 AM Eduard Zingerman <[email protected]> wrote: > > On Sun, 2026-03-08 at 13:55 +0800, sun jian wrote: > > On Sat, Mar 7, 2026 at 12:23 AM Alexei Starovoitov > > <[email protected]> wrote: > > > > > > On Fri, Mar 6, 2026 at 8:15 AM Paul Chaignon <[email protected]> > > > wrote: > > > Sun Jian, > > > I asked to do a _minimal_ tweak to pyperf600. > > > What you did is a drastic change. Pls don't hack tests > > > just to make them pass. The tests have to be meaningful > > > and test coverage shouldn't degrade. > > > > > > > Hi Alexei, Paul, > > > > I spent some more time looking into this. > > > > Comparing unmodified pyperf600 bytecode between clang-18 and clang-20, I > > see fewer instructions with clang-20 and nearly the same number of > > branches: > > > > clang-18: 90134 lines of disassembly, 6090 gotos > > clang-20: 78369 lines of disassembly, 6085 gotos > > > > So this does not look like a simple program-size increase. What seems to > > change is the branch layout in the unrolled loop body, which seems to > > make the verifier DFS go deeper before pruning. > > > > One useful data point is that a single __on_event() copy does load > > successfully (that was my v2), while with 2 or more copies it > > consistently fails at exactly 8193 jumps. In other words, the verifier > > hits the jump-sequence limit before reaching the second copy. > > > > I also tried a range of source-level mitigations, but so far I couldn't > > find one that preserves the test intent and keeps pyperf600 comparable > > to the other variants: > > > > - UNROLL_COUNT tuning: 99 does not compile; 100-120 compile but still > > fail at 8193; 121-145 fail to compile; 146-150 compile but still fail > > at 8193 > > - early break/goto on !frame_ptr: insufficient for pyperf600, and also > > hurts pyperf600_nounroll by adding branch points to the 600-iteration loop > > - wrapping 5x __on_event() in a non-unrolled loop: verifier still unrolls it > > - making get_frame_data() __noinline: still fails > > - moving the unwind loop into a __noinline subprog: still fails > > - SUBPROGS / __on_event as __noinline: still fails; codegen changes, > > but the verifier still hits 8193 > > > > Paul also mentioned trying STACK_MAX_LEN/UNROLL_COUNT and only getting it > > to work with STACK_MAX_LEN reduced to 180, which would make it too close > > to pyperf180. > > > > The only source change I found that passes is reducing __on_event() to a > > single copy, but that clearly weakens the test as pointed out. > > > > At this point, I don't have a source-level fix that preserves the test > > intent. > > Hi Sun, > > I have an old investigation for the pyperf600 failure reason from March 2024. > Attaching it to the email. The discussion happened off-list. > The source-level "mitigation" I found back then still stands: > > --- a/tools/testing/selftests/bpf/progs/pyperf.h > +++ b/tools/testing/selftests/bpf/progs/pyperf.h > @@ -97,8 +97,15 @@ static __always_inline bool get_frame_data(void > *frame_ptr, PidData *pidData, > frame_ptr + > pidData->offsets.PyFrameObject_code); > > // read data from PyCodeObject > +#if __BPF_CPU_VERSION__ < 4 > if (!frame->f_code) > return false; > +#else > + asm volatile goto("if %[f_code] == 0 goto %l[has_f_code];" > + :: [f_code]"r"(frame->f_code) :: has_f_code); > + return false; > +has_f_code: > +#endif > > (One needs cpuv4 because of the jump instructions exceeding 16-bit > offset ranges are only possible with cpuv4). > > The decision back then was that the "mitigation" is too brittle to > apply and we should leave the test as-is, hoping that verifier would > get smarter some day and be able to load the program.
Back then the hope was that it will be fixed imminently, but 2 years later it still fails. So please send your workaround. I prefer to have 'test_progs' passing all tests without denylist.
