On Sun, Mar 8, 2026 at 12:12 AM Eduard Zingerman <[email protected]> wrote:
>
> On Sun, 2026-03-08 at 13:55 +0800, sun jian wrote:
> > On Sat, Mar 7, 2026 at 12:23 AM Alexei Starovoitov
> > <[email protected]> wrote:
> > >
> > > On Fri, Mar 6, 2026 at 8:15 AM Paul Chaignon <[email protected]> 
> > > wrote:
> > > Sun Jian,
> > > I asked to do a _minimal_ tweak to pyperf600.
> > > What you did is a drastic change. Pls don't hack tests
> > > just to make them pass. The tests have to be meaningful
> > > and test coverage shouldn't degrade.
> > >
> >
> > Hi Alexei, Paul,
> >
> > I spent some more time looking into this.
> >
> > Comparing unmodified pyperf600 bytecode between clang-18 and clang-20, I
> > see fewer instructions with clang-20 and nearly the same number of
> > branches:
> >
> > clang-18: 90134 lines of disassembly, 6090 gotos
> > clang-20: 78369 lines of disassembly, 6085 gotos
> >
> > So this does not look like a simple program-size increase. What seems to
> > change is the branch layout in the unrolled loop body, which seems to
> > make the verifier DFS go deeper before pruning.
> >
> > One useful data point is that a single __on_event() copy does load
> > successfully (that was my v2), while with 2 or more copies it
> > consistently fails at exactly 8193 jumps. In other words, the verifier
> > hits the jump-sequence limit before reaching the second copy.
> >
> > I also tried a range of source-level mitigations, but so far I couldn't
> > find one that preserves the test intent and keeps pyperf600 comparable
> > to the other variants:
> >
> > - UNROLL_COUNT tuning: 99 does not compile; 100-120 compile but still
> > fail at 8193; 121-145 fail to compile; 146-150 compile but still fail
> > at 8193
> > - early break/goto on !frame_ptr: insufficient for pyperf600, and also
> > hurts pyperf600_nounroll by adding branch points to the 600-iteration loop
> > - wrapping 5x __on_event() in a non-unrolled loop: verifier still unrolls it
> > - making get_frame_data() __noinline: still fails
> > - moving the unwind loop into a __noinline subprog: still fails
> > - SUBPROGS / __on_event as __noinline: still fails; codegen changes,
> > but the verifier still hits 8193
> >
> > Paul also mentioned trying STACK_MAX_LEN/UNROLL_COUNT and only getting it
> > to work with STACK_MAX_LEN reduced to 180, which would make it too close
> > to pyperf180.
> >
> > The only source change I found that passes is reducing __on_event() to a
> > single copy, but that clearly weakens the test as pointed out.
> >
> > At this point, I don't have a source-level fix that preserves the test
> > intent.
>
> Hi Sun,
>
> I have an old investigation for the pyperf600 failure reason from March 2024.
> Attaching it to the email. The discussion happened off-list.
> The source-level "mitigation" I found back then still stands:
>
>   --- a/tools/testing/selftests/bpf/progs/pyperf.h
>   +++ b/tools/testing/selftests/bpf/progs/pyperf.h
>   @@ -97,8 +97,15 @@ static __always_inline bool get_frame_data(void 
> *frame_ptr, PidData *pidData,
>                               frame_ptr + 
> pidData->offsets.PyFrameObject_code);
>
>           // read data from PyCodeObject
>   +#if __BPF_CPU_VERSION__ < 4
>           if (!frame->f_code)
>                   return false;
>   +#else
>   +        asm volatile goto("if %[f_code] == 0 goto %l[has_f_code];"
>   +                             :: [f_code]"r"(frame->f_code) :: has_f_code);
>   +        return false;
>   +has_f_code:
>   +#endif
>
> (One needs cpuv4 because of the jump instructions exceeding 16-bit
>  offset ranges are only possible with cpuv4).
>
> The decision back then was that the "mitigation" is too brittle to
> apply and we should leave the test as-is, hoping that verifier would
> get smarter some day and be able to load the program.

Back then the hope was that it will be fixed imminently,
but 2 years later it still fails. So please send your workaround.

I prefer to have 'test_progs' passing all tests without denylist.

Reply via email to