Re: Why would SocketChannel be slower when sending a single msg instead of 1k msgs after proper warmup?

Kirk Pepperdine Thu, 13 Apr 2017 08:39:49 -0700

> On Apr 13, 2017, at 5:01 PM, J Crawford <[email protected]> wrote:
> 
> Thanks for everyone who threw some ideas. I was able to prove that it is 
> *not* a JIT/HotSpot de-optimization.


Confirms my inkling and fits with my experience with HotSpot issues. The 
inflation in latency is just way too large for that.

Interesting that the depot was an OSR event.

Try making the messages that you send be a size closer to the MTU.

Kind regards,
Kirk

> 
> First I got the following output when I used "-XX:+PrintCompilation 
> -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining":
> 
>     Thu Apr 13 10:21:16 EDT 2017 Results: totalMessagesSent=100000 
> currInterval=1 latency=4210 timeToWrite=2514 timeToRead=1680 realRead=831 
> zeroReads=2 partialReads=0
>       77543  560 % !   4       Client::run @ -2 (270 bytes)   made not entrant
>     Thu Apr 13 10:21:39 EDT 2017 Results: totalMessagesSent=100001 
> currInterval=30000 latency=11722 timeToWrite=5645 timeToRead=4531 
> realRead=2363 zeroReads=1 partialReads=0
> 
> Even a single branch like "this.interval = totalMessagesSent >= WARMUP ? 
> 30000 : 1;" will trigger the "made not entrant". I even tried to make it a 
> method, but still got "made not entrant".
> 
> So I thought: THAT'S IT!
> 
> Nope, that was *not* it :)
> 
> I got rid of the branch (i.e. IF) by replacing "this.interval = 
> totalMessagesSent >= WARMUP ? 30000 : 1;" with:
> 
>     // not mathematical equivalent but close enough for our purposes
>     // for totalMessagesSent >= WARMUP it return 30001 instead of 30000
>     this.interval = (totalMessagesSent / WARMUP * 30000) + 1;
> 
> Then I ran it again with "-XX:+PrintCompilation 
> -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining" and got:
> 
>     Thu Apr 13 10:36:00 EDT 2017 Results: totalMessagesSent=99998 
> currInterval=1 latency=4122 timeToWrite=2476 timeToRead=1626 realRead=756 
> zeroReads=2 partialReads=0
>     Thu Apr 13 10:36:00 EDT 2017 Results: totalMessagesSent=99999 
> currInterval=1 latency=4041 timeToWrite=2387 timeToRead=1630 realRead=760 
> zeroReads=2 partialReads=0
>     Thu Apr 13 10:36:00 EDT 2017 Results: totalMessagesSent=100000 
> currInterval=1 latency=4223 timeToWrite=2504 timeToRead=1690 realRead=739 
> zeroReads=2 partialReads=0
>     Thu Apr 13 10:36:22 EDT 2017 Results: totalMessagesSent=100001 
> currInterval=30001 latency=10245 timeToWrite=5457 timeToRead=4755 
> realRead=2223 zeroReads=6 partialReads=0
>     Thu Apr 13 10:36:45 EDT 2017 Results: totalMessagesSent=100002 
> currInterval=30001 latency=10908 timeToWrite=4648 timeToRead=6237 
> realRead=719 zeroReads=7 partialReads=0
>     Thu Apr 13 10:37:08 EDT 2017 Results: totalMessagesSent=100003 
> currInterval=30001 latency=10126 timeToWrite=4088 timeToRead=6005 
> realRead=2077 zeroReads=11 partialReads=0
> 
> So no JIT/HotSpot de-optimization.
> 
> I've also ruled out thread issues by pinning my thread to an isolated CPU 
> core through affinity and isolcpus.
> 
> This behavior happens in MacOS and in Linux. I wonder if it also happens on 
> Windows. And in C. That would give us more clues.
> 
> Thanks,
> 
> -JC
> 
> On Thursday, April 13, 2017 at 2:44:49 AM UTC-5, Wojciech Kudla wrote:
> I'd also monitor /proc/interrupts and /proc/softirqs for your target cpu
> 
> 
> On Thu, 13 Apr 2017, 08:36 Gil Tene, <[email protected] <javascript:>> wrote:
> If I read this right. You are running this on localhost (according to SO 
> code). If that's the case, there is no actual network, and no actual TCP 
> stack... UDP or TCP won't make a difference then, and neither will any TCP 
> tweaking. I think this rules out the network, the switch, the NICs, and most 
> of the OS's network stack. 
> 
> Now you're looking at the JVM, the OS scheduling, power management, cache 
> behavior, etc.
> 
> Some more things to play with to rule out or find some insight:
> 
> - Rule out de-optimization (you may be de-optimizing when the if 
> (totalMessagesSent == WARMUP) triggers). Do this by examining at 
> -XX:+PrintCompilation output
> 
> - Rule out scheduling and cpu migration effects: use isolcpus and pin your 
> processes to specific cores
> 
> - How do you know that your actually disabled all power management. I'd 
> monitor cstate and pstate to see what they actually are over time. 
>   Cool anecdote: We once had a case where something in the system was 
> mysteriously elevating cstate away from 0 after we set to to 0. We never did 
> find out what it was. The case was "resolved" with a cron job that set cstate 
> to 0 every minute (yuck. I know).
> 
> - Slew with different interval time in your tests to find out how long the 
> interval needs to be before you see the perf drop. The value at which this 
> effect starts may be an interesting hint
> 
> 
> On Wednesday, April 12, 2017 at 12:56:22 PM UTC-7, J Crawford wrote:
> The SO question has the source codes of a simple server and client that 
> demonstrate and isolate the problem. Basically I'm timing the latency of a 
> ping-pong (client-server-client) message. I start by sending one message 
> every 1 millisecond. I wait for 200k messages to be sent so that the HotSpot 
> has a chance to optimize the code. Then I change my pause time from 1 
> millisecond to 30 seconds. For my surprise my write and read operation become 
> considerably slower.
> 
> I don't think it is a JIT/HotSpot problem. I was able to pinpoint the slower 
> method to the native JNI calls to write (write0) and read. Even if I change 
> the pause from 1 millisecond to 1 second, problem persists.
> 
> I was able to observe that on MacOS and Linux.
> 
> Does anyone here have a clue of what can be happening?
> 
> Note that I'm disabling Nagle's Algorithm with setTcpNoDelay(true).
> 
> SO question with code and output: 
> http://stackoverflow.com/questions/43377600/socketchannel-why-if-i-write-msgs-quickly-the-latency-of-each-message-is-low-b
>  
> <http://stackoverflow.com/questions/43377600/socketchannel-why-if-i-write-msgs-quickly-the-latency-of-each-message-is-low-b>
> 
> Thanks!
> 
> -JC
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Why would SocketChannel be slower when sending a single msg instead of 1k msgs after proper warmup?

Reply via email to