On Sat, Mar 7, 2026 at 12:23 AM Alexei Starovoitov
<[email protected]> wrote:
>
> On Fri, Mar 6, 2026 at 8:15 AM Paul Chaignon <[email protected]> wrote:
> Sun Jian,
> I asked to do a _minimal_ tweak to pyperf600.
> What you did is a drastic change. Pls don't hack tests
> just to make them pass. The tests have to be meaningful
> and test coverage shouldn't degrade.
>

Hi Alexei, Paul,

I spent some more time looking into this.

Comparing unmodified pyperf600 bytecode between clang-18 and clang-20, I
see fewer instructions with clang-20 and nearly the same number of
branches:

clang-18: 90134 lines of disassembly, 6090 gotos
clang-20: 78369 lines of disassembly, 6085 gotos

So this does not look like a simple program-size increase. What seems to
change is the branch layout in the unrolled loop body, which seems to
make the verifier DFS go deeper before pruning.

One useful data point is that a single __on_event() copy does load
successfully (that was my v2), while with 2 or more copies it
consistently fails at exactly 8193 jumps. In other words, the verifier
hits the jump-sequence limit before reaching the second copy.

I also tried a range of source-level mitigations, but so far I couldn't
find one that preserves the test intent and keeps pyperf600 comparable
to the other variants:

- UNROLL_COUNT tuning: 99 does not compile; 100-120 compile but still
fail at 8193; 121-145 fail to compile; 146-150 compile but still fail
at 8193
- early break/goto on !frame_ptr: insufficient for pyperf600, and also
hurts pyperf600_nounroll by adding branch points to the 600-iteration loop
- wrapping 5x __on_event() in a non-unrolled loop: verifier still unrolls it
- making get_frame_data() __noinline: still fails
- moving the unwind loop into a __noinline subprog: still fails
- SUBPROGS / __on_event as __noinline: still fails; codegen changes,
but the verifier still hits 8193

Paul also mentioned trying STACK_MAX_LEN/UNROLL_COUNT and only getting it
to work with STACK_MAX_LEN reduced to 180, which would make it too close
to pyperf180.

The only source change I found that passes is reducing __on_event() to a
single copy, but that clearly weakens the test as pointed out.

At this point, I don't have a source-level fix that preserves the test
intent.

Regards,
Sun Jian

Reply via email to