Thanks for everyone who threw some ideas. I was able to prove that it is*
*not** a JIT/HotSpot de-optimization.
First I got the following output when I used "-XX:+PrintCompilation
-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining":
Thu Apr 13 10:21:16 EDT 2017 Results: totalMessagesSent=100000
currInterval=1 latency=4210 timeToWrite=2514 timeToRead=1680 realRead=831
zeroReads=2 partialReads=0
*77543 560 % ! 4 Client::run @ -2 (270 bytes) made not
entrant*
Thu Apr 13 10:21:39 EDT 2017 Results: totalMessagesSent=100001
currInterval=30000 latency=11722 timeToWrite=5645 timeToRead=4531
realRead=2363 zeroReads=1 partialReads=0
Even a single branch like "this.interval = totalMessagesSent >= WARMUP ?
30000 : 1;" will trigger the "made not entrant". I even tried to make it a
method, but still got "made not entrant".
So I thought: *THAT'S IT!*
Nope, that was **not** it :)
I got rid of the branch (i.e. IF) by replacing "this.interval =
totalMessagesSent >= WARMUP ? 30000 : 1;" with:
// not mathematical equivalent but close enough for our purposes
// for totalMessagesSent >= WARMUP it return 30001 instead of 30000
this.interval = (totalMessagesSent / WARMUP * 30000) + 1;
Then I ran it again with "-XX:+PrintCompilation
-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining" and got:
Thu Apr 13 10:36:00 EDT 2017 Results: totalMessagesSent=99998
currInterval=1 latency=4122 timeToWrite=2476 timeToRead=1626 realRead=756
zeroReads=2 partialReads=0
Thu Apr 13 10:36:00 EDT 2017 Results: totalMessagesSent=99999
currInterval=1 latency=4041 timeToWrite=2387 timeToRead=1630 realRead=760
zeroReads=2 partialReads=0
Thu Apr 13 10:36:00 EDT 2017 Results: totalMessagesSent=100000
currInterval=1 latency=4223 timeToWrite=2504 timeToRead=1690 realRead=739
zeroReads=2 partialReads=0
Thu Apr 13 10:36:22 EDT 2017 Results: totalMessagesSent=100001
currInterval=30001 latency=*10245* timeToWrite=5457 timeToRead=4755
realRead=2223 zeroReads=6 partialReads=0
Thu Apr 13 10:36:45 EDT 2017 Results: totalMessagesSent=100002
currInterval=30001 latency=*10908* timeToWrite=4648 timeToRead=6237
realRead=719 zeroReads=7 partialReads=0
Thu Apr 13 10:37:08 EDT 2017 Results: totalMessagesSent=100003
currInterval=30001 latency=*10126* timeToWrite=4088 timeToRead=6005
realRead=2077 zeroReads=11 partialReads=0
So no JIT/HotSpot de-optimization.
I've also ruled out thread issues by pinning my thread to an isolated CPU
core through affinity and isolcpus.
This behavior happens in MacOS and in Linux. I wonder if it also happens on
Windows. And in C. That would give us more clues.
Thanks,
-JC
On Thursday, April 13, 2017 at 2:44:49 AM UTC-5, Wojciech Kudla wrote:
>
> I'd also monitor /proc/interrupts and /proc/softirqs for your target cpu
>
> On Thu, 13 Apr 2017, 08:36 Gil Tene, <[email protected] <javascript:>> wrote:
>
>> If I read this right. You are running this on localhost (according to SO
>> code). If that's the case, there is no actual network, and no actual TCP
>> stack... UDP or TCP won't make a difference then, and neither will any TCP
>> tweaking. I think this rules out the network, the switch, the NICs, and
>> most of the OS's network stack.
>>
>> Now you're looking at the JVM, the OS scheduling, power management, cache
>> behavior, etc.
>>
>> Some more things to play with to rule out or find some insight:
>>
>> - Rule out de-optimization (you may be de-optimizing when the if
>> (totalMessagesSent == WARMUP) triggers). Do this by examining at
>> -XX:+PrintCompilation output
>>
>> - Rule out scheduling and cpu migration effects: use isolcpus and pin
>> your processes to specific cores
>>
>> - How do you know that your actually disabled all power management. I'd
>> monitor cstate and pstate to see what they actually are over time.
>> Cool anecdote: We once had a case where something in the system was
>> mysteriously elevating cstate away from 0 after we set to to 0. We never
>> did find out what it was. The case was "resolved" with a cron job that set
>> cstate to 0 every minute (yuck. I know).
>>
>> - Slew with different interval time in your tests to find out how long
>> the interval needs to be before you see the perf drop. The value at which
>> this effect starts may be an interesting hint
>>
>>
>> On Wednesday, April 12, 2017 at 12:56:22 PM UTC-7, J Crawford wrote:
>>>
>>> The SO question has the source codes of a simple server and client that
>>> demonstrate and isolate the problem. Basically I'm timing the latency of a
>>> ping-pong (client-server-client) message. I start by sending one message
>>> every 1 millisecond. I wait for 200k messages to be sent so that the
>>> HotSpot has a chance to optimize the code. Then I change my pause time from
>>> 1 millisecond to 30 seconds. For my surprise my write and read operation
>>> become considerably slower.
>>>
>>> I don't think it is a JIT/HotSpot problem. I was able to pinpoint the
>>> slower method to the native JNI calls to write (write0) and read. Even if I
>>> change the pause from 1 millisecond to 1 second, problem persists.
>>>
>>> I was able to observe that on MacOS and Linux.
>>>
>>> Does anyone here have a clue of what can be happening?
>>>
>>> Note that I'm disabling Nagle's Algorithm with setTcpNoDelay(true).
>>>
>>> SO question with code and output:
>>> http://stackoverflow.com/questions/43377600/socketchannel-why-if-i-write-msgs-quickly-the-latency-of-each-message-is-low-b
>>>
>>> Thanks!
>>>
>>> -JC
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "mechanical-sympathy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
--
You received this message because you are subscribed to the Google Groups
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.