[PR] test: fix InterpreterBenchmark so it produces trustworthy numbers [pekko]

via GitHub Wed, 20 May 2026 23:42:02 -0700


He-Pin opened a new pull request, #2985:
URL: https://github.com/apache/pekko/pull/2985


   ### Motivation
   `InterpreterBenchmark` had two independent bugs that made its results 
unreliable, which becomes a problem the moment anyone wants to evaluate a 
`GraphInterpreter`-touching change against it.
   
   1. The benchmark body was `new GraphInterpreterSpecKit { new TestSetup { ... 
} }`. Because that ran inside `@Benchmark`, every invocation built (and never 
tore down) a fresh `ActorSystem`. Long iterations exhausted native threads and 
JMH ended up reporting empty results once the JVM ran out of resources.
   2. Inside the assembly we used `GraphStages.identity[Int]` once per slot, 
but `GraphStages.identity` is a singleton whose `Inlet`/`Outlet` shape is 
shared across every reference. Chaining N copies (`numberOfIds = 5/10`) 
collapses to a single shape and mis-wires the connections; the run logged a 
flood of `Cannot pull port twice` errors and ended up reporting nonsense 
throughput (5/10-stage configs faster than the 1-stage one).
   
   ### Modification
   - Make `InterpreterBenchmark` itself extend `GraphInterpreterSpecKit` so 
JMH's `@State(Scope.Benchmark)` lifecycle reuses one `ActorSystem` across 
invocations, and add `@TearDown(Level.Trial)` to terminate it cleanly.
   - Define a local `IdentityStage[T]` with its own `Inlet`/`Outlet` per 
instance and use `Vector.fill(numberOfIds)(new IdentityStage[Int])` so each 
slot in the chain is a distinct stage with a distinct shape.
   
   No changes to production code — this PR only fixes the benchmark.
   
   ### Result
   The benchmark now runs to completion without leaking actor systems and 
produces stable, monotonic numbers (throughput decreases as `numberOfIds` 
grows, as expected). This restores it as a usable baseline for subsequent 
`GraphInterpreter` work.
   
   JMH on this branch (JDK 25, G1, single thread, `-i 5 -wi 3 -f 1 -t 1`):
   
   ```
   Benchmark                                             (numberOfIds)   Mode  
Cnt      Score      Error   Units
   InterpreterBenchmark.graph_interpreter_100k_elements              1  thrpt   
 5  45238.227 ± 3143.242  ops/ms
   InterpreterBenchmark.graph_interpreter_100k_elements              5  thrpt   
 5  10526.376 ±  151.239  ops/ms
   InterpreterBenchmark.graph_interpreter_100k_elements             10  thrpt   
 5   5350.558 ±  192.965  ops/ms
   ```
   
   Pre-fix the 5/10-stage rows were both higher than the 1-stage row (i.e. 
wrong direction) because the singleton-shape bug meant the chain wasn't 
actually N stages long.
   
   This is a benchmark-correctness fix, not a performance improvement. There is 
no production-code change here.
   
   ### Tests
   - `sbt 'bench-jmh/compile'`
   - `sbt 'bench-jmh/headerCheck; bench-jmh/scalafmtCheck'`
   - `sbt 'bench-jmh/Jmh/run -i 5 -wi 3 -f 1 -t 1 -rf json -rff /tmp/jmh.json 
.*InterpreterBenchmark.*'` — completes cleanly, scores above.
   
   ### References
   None - benchmark-only fix surfaced while preparing to evaluate 
`GraphInterpreter` micro-optimizations against a trustworthy baseline.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] test: fix InterpreterBenchmark so it produces trustworthy numbers [pekko]

Reply via email to