> Thanks for running these, Matthew. Alas, I cannot reproduce the
difference. How often did you run to reproduce the noticeable results
yourself?

I did re-run the entire benchmark 2-3 times. There was definitely a
deviation of around +-5% but I there was a general trend in which
the inliner version was faster

> Here are a few more ideas:

> - rather clock the cpu down and keep check of the actual frequencies
while running (e.g. using perf stat), instead of turning the fans up.
There might still be thermal reservoirs of unknown size and state even
with highest fan settings, especially in laptops.
> - always run more than one fork, many benchmarks have (at least)
bimodal distributions of performance in steady state varying by a few
percent. If you know there's a multimodal distribution, make a rough
estimation how many forks you need to really make sure aren't
comparing one coin flip with the other one on the other side.
> - if the error bars overlap a lot that's a sign that you need to run
more benchmarks for validation (jmh reports a large 99.9% CI but make
sure you are confident the interval itself makes sense, 3 measurements
are really the bare minimum for those to make any sense at all)

Yes I am aware of these but since I am overseas they are being run
on an M1 laptop which is not desirable (hence why I said
beforehand take the results as gospel before). Afaik due to how M1 is
designed
there isn't a proper way to do things like freeze clocks, the best you can
do
is to make sure the thermals are as constant as possible.

At home I have a desktop setup specifically for benchmarking, i.e. frozen
clocks in bios + scheduler tweaks etc etc but I don't have access to it now.
Hence why I am pushing for an actual proper dedicated HW solution

> Also the result seems somewhat cherry-picked as e.g. most of the
LineParserBenchmarks were slightly (up to 7%) better without inlining
(speaking about your results in all of the above).

Yes I also noticed this, from quick grocking the reverse case was less
frequent but I think we are in agreement that we shouldn't make
any definite conclusions here.

> Btw. the main improvement that the inliner promises is that it can
avoid a particular kind of megamorphic call site in higher-order
function use, i.e. scala collection usage. In most performance
critical situations, this has been manually optimized before where it
turned up in profiles. All the other expensive places of megamorphic
call sites (dispatchers, stream GraphInterpreter, routing DSL) are
unfortunately not "static enough" that an AOT inliner could pick them
up with static analysis.

Definitely agreed, the kicker being the "performance critical situations".
The
Scala 2 inliner is having an impact, when enabled it will automatically
inline
any code that it knows won't get inlined by the JDK (particularly the
earlier
ones like JDK 1.8, I heard later JDK's/GraalVM do a better job of this) and
as
you can tell from the bytecode diff there were a lot of cases like this.
This is what
I meant earlier when I was saying I would be surprised if there was
a significant difference. I would presume that the Akka already did their
best to
optimize any actual performance critical hotspots.

In any case, I would still opt for leaving the inliner in there. I would be
extremely
shocked if there are any correctness issues given that its being used for
at least
half a decade in many projects hence the "free lunch" argument and the
complexity is not that high (we are dealing with some extra scalac flags)
and
we are almost finished with enabling it. Also in in terms of timing now is
the
best time, giving Pekko 1.1.x is on the horizon, the intent of doing
these changes
now is so that they won't be done later at more inopportune times
(I am predicting Pekko 1.1.x to last a while).

One obvious advantage the inliner does have is for newly contributed code,
it
can take care of some hotspots for us which given how we are trying to build
up contributors now is a nice bonus, not every contributor to Pekko is knee
deep into JVM performance and we don't have the setup/labor now to profile
problematic cases like Lightbend/Akka did/has.

If there are any critical issues exposed from the inliner I can definitely
look into
reverting it, but I already did a lot of preparation work plan for this so
I am not
expecting anything.

On Thu, Jan 18, 2024 at 2:17 AM Johannes Rudolph <johannes.rudo...@gmail.com>
wrote:

> On Wed, Jan 17, 2024 at 2:07 AM Matthew de Detrich
> <matthew.dedetr...@aiven.io.invalid> wrote:
> > As you can see from the results there are some noticeable improvements
> > (i.e. 5-10% in some cases) however I wouldn't take these results as
> > complete gospel as I had to do the benchmarks on my M1 laptop (I had it
> in
> > power plus used TGPro to put fans on max blast to reduce any variability,
> > unfortunately I am currently overseas so I don't have a dedicated machine
> > to test on).
>
> Thanks for running these, Matthew. Alas, I cannot reproduce the
> difference. How often did you run to reproduce the noticeable results
> yourself?
>
> Here are a few more ideas:
>
>  - rather clock the cpu down and keep check of the actual frequencies
> while running (e.g. using perf stat), instead of turning the fans up.
> There might still be thermal reservoirs of unknown size and state even
> with highest fan settings, especially in laptops.
>  - always run more than one fork, many benchmarks have (at least)
> bimodal distributions of performance in steady state varying by a few
> percent. If you know there's a multimodal distribution, make a rough
> estimation how many forks you need to really make sure aren't
> comparing one coin flip with the other one on the other side.
>  - if the error bars overlap a lot that's a sign that you need to run
> more benchmarks for validation (jmh reports a large 99.9% CI but make
> sure you are confident the interval itself makes sense, 3 measurements
> are really the bare minimum for those to make any sense at all)
>
> Many of these issues are also taken care of by creating a (e.g.
> nightly) long-running benchmark series where the random fluctuations
> become quite apparent over days.
>
> In general, for complex benchmarks like in pekko-http, I like to use
> these rough guidelines for evaluating benchmark evidence:
>
>  * < 5% difference needs exceptional statistical evidence and a
> reasonable explanation for the behavior (e.g. you tried to optimize
> something before and the improvements are exactly in the area that you
> expected)
>  * 5-10% difference needs very good statistical evidence and/or
> explanations for the improvements
>  * ...
>  * > 10-15% if consistently better in multiple runs and environment,
> likely an improvement
>
> (When benchmarking single methods you might relax the judgement,
> though then the measured performance might not materialize in more
> realistic scenarios)
>
> The StreamedServerProcessing result seems somewhat internally
> inconsistent since the same "chunked" configuration with different
> chunk sizes shows somewhat different behavior which is possible but
> maybe not super likely?
>
> Also the result seems somewhat cherry-picked as e.g. most of the
> LineParserBenchmarks were slightly (up to 7%) better without inlining
> (speaking about your results in all of the above).
>
> Here are my quick results (also very weak evidence):
> https://gist.github.com/jrudolph/bc97146dedf0290d059e5e44939fbdc0
>
> Btw. the main improvement that the inliner promises is that it can
> avoid a particular kind of megamorphic call site in higher-order
> function use, i.e. scala collection usage. In most performance
> critical situations, this has been manually optimized before where it
> turned up in profiles. All the other expensive places of megamorphic
> call sites (dispatchers, stream GraphInterpreter, routing DSL) are
> unfortunately not "static enough" that an AOT inliner could pick them
> up with static analysis.
>
> Johannes
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@pekko.apache.org
> For additional commands, e-mail: dev-h...@pekko.apache.org
>
>

-- 

Matthew de Detrich

*Aiven Deutschland GmbH*

Immanuelkirchstraße 26, 10405 Berlin

Alexanderufer 3-7, 10117 Berlin

Amtsgericht Charlottenburg, HRB 209739 B

Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen

*m:* +491603708037

*w:* aiven.io *e:* matthew.dedetr...@aiven.io

Reply via email to