+1 to the default of Block! 1ns is too small. No wonder is sucked CPU. :-)
Thanks for looking! -Chris Sent from my iPhone On 21/03/2014, at 1:32 PM, Remko Popma <[email protected]> wrote: > Took another look at the Disruptor SleepingWait strategy. It actually uses > a back-off strategy. From the javadoc: > > Sleeping strategy that initially spins, then uses a Thread.yield(), and > eventually for the minimum number of nanos the OS and JVM will allow while > the {@link com.lmax.disruptor.EventProcessor}s are waiting on a barrier. > This strategy is a good compromise between performance and CPU resource. > Latency spikes can occur after quiet periods. > > The Disruptor SleepingWait strategy code eventually calls LockSupport. > parkNanos(1L); > Different platforms have different timer resolution (I think Windows cannot > go smaller than ~15 millis), and it is possible that AIX has a more > accurate clock. > > I'm beginning to think perhaps BlockingWait should be the default for log4j. > > On Fri, Mar 21, 2014 at 9:33 AM, Chris Graham <[email protected]> wrote: > >> The AIX system clock is not the same base time as most Intel boxes. What is >> the sleep time in the sleep strategy? If it's being derived, it might be >> too small. ??? >> >> Just a thought. >> >> To further complicate matters, this particular lpar was uncapped, which >> means that it can steal CPU from other lpars that are not as busy. So the >> number of active CPU's can dynamically vary. >> >> -Chris >> >> >> >> On Fri, Mar 21, 2014 at 9:01 AM, Remko Popma <[email protected]> >> wrote: >> >>> No, it turned out that the docs for Apache Archiva were incorrect and the >>> WaitStrategy was effectively still SleepingWaitStrategy. Using the >> correct >>> config to specify BlockingWaitStrategy solved the issue. (LOG4J2-571) >>> >>> FYI, the wait strategy determines what the consumer thread does when the >>> queue is empty & it's waiting for events. Some specialized apps want to >>> avoid the latency of waking up a blocked thread, so there are a number of >>> options with different trade-offs, with busy spin on one end of the >>> spectrum and blocking on the other. For log4j I reduced the set of >>> available options (no busy spin), and choose SleepingWait as the >> default. >>> This had the best performance in my testing. Until now I hadn't seen any >>> excessive CPU usage. >>> >>> Sent from my iPhone >>> >>>> On 2014/03/20, at 22:10, Matt Sicker <[email protected]> wrote: >>>> >>>> Perhaps lmax disruptor doesn't work properly in the IBM JVM? >>>> >>>>> On Tuesday, 18 March 2014, Chris Graham <[email protected]> wrote: >>>>> >>>>> JStack is a Sun thing. This is the IBM JDK on AIX. >>>>> I've run the tprof command twice and verified it. >>>>> >>>>> The full work though follows. >>>>> >>>>> The output from topas (same as top, effectively) is: >>>>> >>>>> Topas Monitor for host: XXXXXXXXXXXXXXX EVENTS/QUEUES >>> FILE/TTY >>>>> Wed Mar 19 14:49:55 2014 Interval: 2 Cswitch 3581 >>>>> Readch 686 >>>>> Syscall 2763 >> Writech >>>>> 1378 >>>>> CPU User% Kern% Wait% Idle% Physc Entc Reads 7 >>>>> Rawin 0 >>>>> ALL 48.8 1.2 0.0 50.1 1.03 51.7 Writes 5 >>>>> Ttyout 643 >>>>> Forks 0 >>>>> Igets 0 >>>>> Network KBPS I-Pack O-Pack KB-In KB-Out Execs 0 >>>>> Namei 81 >>>>> Total 2.8 10.0 7.5 1.1 1.7 Runqueue 1.0 >>>>> Dirblk 0 >>>>> Waitqueue 0.0 >>>>> Disk Busy% KBPS TPS KB-Read KB-Writ >> MEMORY >>>>> Total 0.0 0.0 0.0 0.0 0.0 PAGING >> Real,MB >>>>> 20480 >>>>> Faults 3 % Comp >>>>> 44 >>>>> FileSystem KBPS TPS KB-Read KB-Writ Steals 0 % >>> Noncomp >>>>> 54 >>>>> Total 0.6 1.5 0.6 0.0 PgspIn 0 % >>> Client >>>>> 54 >>>>> PgspOut 0 >>>>> Name PID CPU% PgSp Owner PageIn 0 >> PAGING >>>>> SPACE >>>>> java 9437312 48.6 739.1 wasadmin PageOut 0 >> Size,MB >>>>> 15552 >>>>> >>>>> Process with a PID of 9437312 is the WebSphere instance that runs >>> Archiva, >>>>> the problem in question. >>>>> >>>>> We then use a native AIX tool, tprof to examine that process, and see >>> what >>>>> it's doing: >>>>> >>>>> tprof -j -P 9437312 -skex sleep 60 >>>>> >>>>> This generates sleep.prof, and the relevant section is: >>>>> >>>>> Configuration information >>>>> ========================= >>>>> System: AIX 6.1 Node: au02qap207teax2 Machine: 00C43D204C00 >>>>> Tprof command was: >>>>> tprof -j -P 9437312 -skex sleep 60 >>>>> Trace command was: >>>>> /usr/bin/trace -ad -M -L 2438933299 -T 500000 -j >>>>> 00A,001,002,003,38F,005,006,134,210,139,5A2,5A5,465,234,5D8, -o - >>>>> Total Samples = 24008 >>>>> Traced Time = 60.02s (out of a total execution time of 60.02s) >>>>> >>>>> >>> >> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< >>>>> >>>>> Process Freq Total Kernel User Shared >> Other >>>>> Java >>>>> ======= ==== ===== ====== ==== ====== >> ===== >>>>> ==== >>>>> wait 4 64.29 64.29 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> java 182 34.66 1.97 0.00 32.65 >> 0.05 >>>>> 0.00 >>>>> /usr/sbin/syncd 2 0.17 0.17 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> sshd: 1 0.14 0.09 0.05 0.01 >> 0.00 >>>>> 0.00 >>>>> /usr/bin/ps 13 0.12 0.10 0.00 0.03 >> 0.00 >>>>> 0.00 >>>>> /usr/bin/sh 27 0.12 0.11 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> httpd 12 0.09 0.05 0.01 0.03 >> 0.00 >>>>> 0.00 >>>>> /usr/bin/scp 1 0.09 0.07 0.00 0.01 >> 0.00 >>>>> 0.00 >>>>> /opt/freeware/bin/readlink 6 0.04 0.04 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> /usr/bin/tprof 1 0.03 0.00 0.01 0.02 >> 0.00 >>>>> 0.00 >>>>> kulagent 7 0.03 0.02 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> swapper 1 0.02 0.02 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> /usr/bin/grep 5 0.02 0.02 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> /usr/sbin/getty 1 0.02 0.02 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> kuxagent 3 0.02 0.01 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> lrud 1 0.02 0.02 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> srmAIX 3 0.01 0.01 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> /opt/tivoli/cit/bin/wscanuse 1 0.01 0.01 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> psmd 1 0.01 0.01 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> xmgc 1 0.01 0.01 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> nfs_stat 2 0.01 0.01 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> /usr/bin/topasrec 2 0.01 0.01 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> /usr/bin/xmtopas 1 0.01 0.00 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> kcawd 1 0.00 0.00 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> /usr/sbin/snmpd 1 0.00 0.00 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> /usr/sbin/snmpmibd 1 0.00 0.00 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> /usr/bin/date 1 0.00 0.00 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> /usr/bin/entstat 1 0.00 0.00 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> gil 1 0.00 0.00 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> /usr/bin/vmstat 1 0.00 0.00 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> /usr/bin/ln 1 0.00 0.00 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> stat_daemon 1 0.00 0.00 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> j2pg 1 0.00 0.00 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> /usr/sbin/netstat 1 0.00 0.00 0.00 0.00 >> 0.00 >>>>> 0.00 >>>>> ======= ==== ===== ====== ==== ====== >> ===== >>>>> ==== >>>>> Total 289 100.00 67.11 0.07 32.77 >> 0.05 >>>>> 0.00 >>>>> >>>>> Process PID TID Total Kernel User Shared Other >>>>> Java >>>>> ======= === === ===== ====== ==== ====== ===== >>>>> ==== >>>>> java 9437312 46072057 23.76 0.05 0.00 23.70 0.00 >>>>> 0.00 >>>>> >>>>> Which gives me the needed TID, 46072057 = 0x2BF00F9. >>>>> >>>>> So to trigger a non-fatal heap dump: >>>>> >>>>> kill -3 9437312 >>>>> >>>>> and then we look into the created >>> javacore.20140318.173250.9437312.0002.txt >>>>> file for references to 0x2BF00F9: >>>>> >>>>> 3XMTHREADINFO "AsyncLoggerConfig-1" >> J9VMThread:0x0000000031D14600, >>>>> j9thread_t:0x00000100137D8BD0, java/lang/Thread:0x000000004301C508, >>>>> state:CW, prio=5 >>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x6A, isDaemon:true) >>>>> 3XMTHREADINFO1 (native thread ID:0x2BF00F9, native >>> priority:0x5, >>>>> native policy:UNKNOWN) >>>>> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 >>> (0x0) >>>>> 3XMTHREADINFO3 Java callstack: >>>>> 4XESTACKTRACE at sun/misc/Unsafe.park(Native Method) >>>>> 4XESTACKTRACE at >>>>> java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.java:332) >>>>> 4XESTACKTRACE at >>>>> >>>>> >>> >> com/lmax/disruptor/SleepingWaitStrategy.applyWaitMethod(SleepingWaitStrategy.java:66) >>>>> 4XESTACKTRACE at >>>>> >>>>> >>> >> com/lmax/disruptor/SleepingWaitStrategy.waitFor(SleepingWaitStrategy.java:39) >>>>> 4XESTACKTRACE at >>>>> >>>>> >>> >> com/lmax/disruptor/ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:55) >>>>> 4XESTACKTRACE at >>>>> >> com/lmax/disruptor/BatchEventProcessor.run(BatchEventProcessor.java:115) >>>>> 4XESTACKTRACE at >>>>> >>>>> >>> >> java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) >>>>> 4XESTACKTRACE at >>>>> >>>>> >>> >> java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) >>>>> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:773) >>>>> 3XMTHREADINFO3 Native callstack: >>>>> 4XENATIVESTACK _event_wait+0x2b8 (0x09000000007E7D3C >>>>> [libpthreads.a+0x16d3c]) >>>>> 4XENATIVESTACK _cond_wait_local+0x4e4 >> (0x09000000007F5A48 >>>>> [libpthreads.a+0x24a48]) >>>>> 4XENATIVESTACK _cond_wait+0xbc (0x09000000007F6020 >>>>> [libpthreads.a+0x25020]) >>>>> 4XENATIVESTACK pthread_cond_wait+0x1a8 >> (0x09000000007F6C8C >>>>> [libpthreads.a+0x25c8c]) >>>>> 4XENATIVESTACK (0x0900000001223014 >> [libj9thr26.so+0x6014]) >>>>> 4XENATIVESTACK (0x0900000001222C60 >> [libj9thr26.so+0x5c60]) >>>>> 4XENATIVESTACK (0x090000000116AE58 >> [libj9vm26.so+0xfe58]) >>>>> 4XENATIVESTACK (0x090000000116B17C >> [libj9vm26.so+0x1017c]) >>>>> 4XENATIVESTACK (0x0900000001810528 >>>>> [libjclscar_26.so+0x5c528]) >>>>> 4XENATIVESTACK (0x0900000001813B98 >>>>> [libjclscar_26.so+0x5fb98]) >>>>> 4XENATIVESTACK (0x0900000001161764 >> [libj9vm26.so+0x6764]) >>>>> 4XENATIVESTACK (0x0900000001239CA0 >> [libj9prt26.so+0x2ca0]) >>>>> 4XENATIVESTACK (0x09000000011615D4 >> [libj9vm26.so+0x65d4]) >>>>> 4XENATIVESTACK (0x090000000121FAF4 >> [libj9thr26.so+0x2af4]) >>>>> 4XENATIVESTACK _pthread_body+0xf0 (0x09000000007D4D34 >>>>> [libpthreads.a+0x3d34]) >>>>> NULL >>>>> >>>>> Would you like me to attach the complete sleep.tprof and javacore.txt >>> file >>>>> to the Jira ticket that I just created? >>>>> >>>>> https://issues.apache.org/jira/browse/LOG4J2-571 >>>>> >>>>> Thanks for looking, this has been driving me nuts. >>>>> >>>>> -Chris >>>>> >>>>> >>>>> >>>>> On Wed, Mar 19, 2014 at 2:45 PM, Remko Popma <[email protected]> >>>>> wrote: >>>>> >>>>>> Just to double-check: when using blocking wait, and running jstack, >>> does >>>>>> BlockingWaitStrategy appear in the stack trace? >>>>>> Also, it it possible to double-check (perhaps attach VisualVM) that >> it >>>>>> definitely is the AsyncLoggerConfig-1 thread that consumes so much >> CPU? >>>>>> >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On 2014/03/19, at 12:31, Chris Graham <[email protected]> wrote: >>>>>>> >>>>>>> I have tried both Block and Sleep (the default), but not Yield. No >>>>>>> discernable difference. >>>>>>> >>>>>>> >>>>>>>> On Wed, Mar 19, 2014 at 1:53 PM, Remko Popma < >> [email protected]> >>>>>> wrote: >>>>>>>> >>>>>>>> As tweeted, I suggest trying the blocking wait strategy. Can you >> run >>> a >>>>>>>> jstack dump (and perhaps attach result to a Jira ticket)? In the >>>>>> attached >>>>>>>> stack trace below, the AsyncLoggerConfig-1 thread seems to be >> parked, >>>>>>>> waiting for a new log event... Doesn't explain high CPU usage... >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On 2014/03/19, at 10:27, Chris Graham <[email protected]> >> wrote: >>>>>>>>> >>>>>>>>> Hello Everyone. >>>>>>>>> >>>>>>>>> In this instance, I'm in indirect used of log4j2 2.0-rc1, as it's >> in >>>>>> the >>>>>>>>> web app that I'm using, Apache Archiva 2.0.1. >>>>>>>>> >>>>>>>>> The issue is that when running under WebSphere 8.5.0.2 (obviously >> on >>>>>> the >>>>>>>>> IBM JDK, 1.6) on AIX 6.1 TL8, Apache Archiva when it's doing >>> nothing, >>>>>> is >>>>>>>>> sitting idle on around 50% CPU. >>>>>>>>> >>>>>>>>> Obviosuly, this is not good! >>>>>>>>> >>>>>>>>> I've performed the AIX native analysis, to get the native thread >> ID, >>>>>>>> mapped >>>>>>>>> it to a Java thread it, triggered a heap dump, and I've found this >>> as >>>>>> the >>>>>>>>> culprit: >>>>>>>>> >>>>>>>>> 3XMTHREADINFO "AsyncLoggerConfig-1" >>>>> J9VMThread:0x0000000031D14600, >>>>>>>>> j9thread_t:0x00000100137D8BD0, >> java/lang/Thread:0x000000004301C508, >>>>>>>>> state:CW, prio=5 >>>>>>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x6A, >>>>> isDaemon:true) >>>>>>>>> 3XMTHREADINFO1 (native thread ID:0x2BF00F9, native >>>>>>>> priority:0x5, >>>>>>>>> native policy:UNKNOWN) >>>>>>>>> 3XMHEAPALLOC Heap bytes allocated since last GC >> cycle=0 >>>>>> (0x0) >>>>>>>>> 3XMTHREADINFO3 Java callstack: >>>>>>>>> 4XESTACKTRACE at sun/misc/Unsafe.park(Native >> Method) >>>>>>>>> 4XESTACKTRACE at >>>>> java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.java:332) >>>>>>>>> 4XESTACKTRACE at >>>>> >>> >> com/lmax/disruptor/SleepingWaitStrategy.applyWaitMethod(SleepingWaitStrategy.java:66) >>>>>>>>> 4XESTACKTRACE >>>> >>>> >>>> >>>> -- >>>> Matt Sicker <[email protected]> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
