On Sat, Mar 7, 2026 at 12:23 AM Alexei Starovoitov <[email protected]> wrote: > > On Fri, Mar 6, 2026 at 8:15 AM Paul Chaignon <[email protected]> wrote: > Sun Jian, > I asked to do a _minimal_ tweak to pyperf600. > What you did is a drastic change. Pls don't hack tests > just to make them pass. The tests have to be meaningful > and test coverage shouldn't degrade. >
Hi Alexei, Paul, I spent some more time looking into this. Comparing unmodified pyperf600 bytecode between clang-18 and clang-20, I see fewer instructions with clang-20 and nearly the same number of branches: clang-18: 90134 lines of disassembly, 6090 gotos clang-20: 78369 lines of disassembly, 6085 gotos So this does not look like a simple program-size increase. What seems to change is the branch layout in the unrolled loop body, which seems to make the verifier DFS go deeper before pruning. One useful data point is that a single __on_event() copy does load successfully (that was my v2), while with 2 or more copies it consistently fails at exactly 8193 jumps. In other words, the verifier hits the jump-sequence limit before reaching the second copy. I also tried a range of source-level mitigations, but so far I couldn't find one that preserves the test intent and keeps pyperf600 comparable to the other variants: - UNROLL_COUNT tuning: 99 does not compile; 100-120 compile but still fail at 8193; 121-145 fail to compile; 146-150 compile but still fail at 8193 - early break/goto on !frame_ptr: insufficient for pyperf600, and also hurts pyperf600_nounroll by adding branch points to the 600-iteration loop - wrapping 5x __on_event() in a non-unrolled loop: verifier still unrolls it - making get_frame_data() __noinline: still fails - moving the unwind loop into a __noinline subprog: still fails - SUBPROGS / __on_event as __noinline: still fails; codegen changes, but the verifier still hits 8193 Paul also mentioned trying STACK_MAX_LEN/UNROLL_COUNT and only getting it to work with STACK_MAX_LEN reduced to 180, which would make it too close to pyperf180. The only source change I found that passes is reducing __on_event() to a single copy, but that clearly weakens the test as pointed out. At this point, I don't have a source-level fix that preserves the test intent. Regards, Sun Jian
