I ran the benchmark with a profiler and was able to reproduce both modes, 
fast and slow. The difference appears to be due to how HotSpot compiles the 
DefaultMailbox.run() ->  ActorContext.invokeUserMessage(msg) sequence. In 
fast mode, DefaultMailbox.run inlines ActorContext.invokeUserMessage() and
both PingActor.autoReceive() and EchoActor.autoReceive() thus recognizing 
that there are only two interface implementations for Actor.autoReceive().
In slow mode, invokeUserMessage() goes through a series of initial 
compilations to be finally deoptimized and call PingActor.autoReceive() and 
EchoActor.autoReceive()
via itab, i.e. a generic interface call mechanism, which is quite expensive 
(can there be even more implementations than these two?). 
Which path is taken by HotSpot may depend on how many different interface 
implementations it observes at the autoReceive call site for each 
intermediate 
compilation and that is totally indeterministic due to the nature of the 
benchmark.

    -- Oleg

On Tuesday, August 1, 2017 at 10:26:55 AM UTC-7, Roger Alsing wrote:
>
> Some context: I'm building an actor framework, similar to Akka but 
> polyglot/cross-platform..
> For each platform we have the same benchmarks, where one of them is an in 
> process ping-pong benchmark.
>
> On .NET and Go, we can spin up pairs of ping-pong actors equal to the 
> number of cores in the CPU and no matter if we spin up more pairs, the 
> total throughput remains roughly the same.
> But, on the JVM. if we do this, I can see how we max out at 100% CPU, as 
> expected, but if I instead spin up a lot more pairs, e.g. 20 * core_count, 
> the total throughput tipples.
>
> I suspect this is due to the system running in a more steady state kind of 
> fashion in the latter case, mailboxes are never completely drained and 
> actors don't have to switch between processing and idle.
> Would this be fair to assume?
> This is the reason why I believe this is a question for this specific 
> forum.
>
> Now to the real question.. roughly 60-40 when the benchmark is started, it 
> runs at 250 mil msg/sec. steadily and the other times it runs at 350 mil 
> msg/sec.
> The reason why I find this strange is that it is stable over time. if I 
> don't stop the benchmark, it will continue at the same pace.
>
> If anyone is bored and like to try it out, the repo is here:
> https://github.com/AsynkronIT/protoactor-kotlin
> and the actual benchmark here: 
> https://github.com/AsynkronIT/protoactor-kotlin/blob/master/examples/src/main/kotlin/actor/proto/examples/inprocessbenchmark/InProcessBenchmark.kt
>
> This is also consistent with or without various vm arguments.
>
> I'm very interested to hear if anyone has any theories what could cause 
> this behavior.
>
> One factor that seems to be involved is GC, but not in the obvious way, 
> rather reversed.
> In the beginning, when the framework allocated more memory, it more often 
> ran at the high speed.
> And the fewer allocations I've managed to do w/o touching the hot path, 
> the more the benchmark have started to toggle between these two numbers.
>
> Thoughts?
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to