Took another look at the Disruptor SleepingWait strategy. It actually uses
a back-off strategy. From the javadoc:
Sleeping strategy that initially spins, then uses a Thread.yield(), and
eventually for the minimum number of nanos the OS and JVM will allow while
the {@link com.lmax.disruptor.EventProcessor}s are waiting on a barrier.
This strategy is a good compromise between performance and CPU resource.
Latency spikes can occur after quiet periods.
The Disruptor SleepingWait strategy code eventually calls LockSupport.
parkNanos(1L);
Different platforms have different timer resolution (I think Windows cannot
go smaller than ~15 millis), and it is possible that AIX has a more
accurate clock.
I'm beginning to think perhaps BlockingWait should be the default for log4j.
On Fri, Mar 21, 2014 at 9:33 AM, Chris Graham <[email protected]> wrote:
> The AIX system clock is not the same base time as most Intel boxes. What is
> the sleep time in the sleep strategy? If it's being derived, it might be
> too small. ???
>
> Just a thought.
>
> To further complicate matters, this particular lpar was uncapped, which
> means that it can steal CPU from other lpars that are not as busy. So the
> number of active CPU's can dynamically vary.
>
> -Chris
>
>
>
> On Fri, Mar 21, 2014 at 9:01 AM, Remko Popma <[email protected]>
> wrote:
>
> > No, it turned out that the docs for Apache Archiva were incorrect and the
> > WaitStrategy was effectively still SleepingWaitStrategy. Using the
> correct
> > config to specify BlockingWaitStrategy solved the issue. (LOG4J2-571)
> >
> > FYI, the wait strategy determines what the consumer thread does when the
> > queue is empty & it's waiting for events. Some specialized apps want to
> > avoid the latency of waking up a blocked thread, so there are a number of
> > options with different trade-offs, with busy spin on one end of the
> > spectrum and blocking on the other. For log4j I reduced the set of
> > available options (no busy spin), and choose SleepingWait as the
> default.
> > This had the best performance in my testing. Until now I hadn't seen any
> > excessive CPU usage.
> >
> > Sent from my iPhone
> >
> > > On 2014/03/20, at 22:10, Matt Sicker <[email protected]> wrote:
> > >
> > > Perhaps lmax disruptor doesn't work properly in the IBM JVM?
> > >
> > >> On Tuesday, 18 March 2014, Chris Graham <[email protected]> wrote:
> > >>
> > >> JStack is a Sun thing. This is the IBM JDK on AIX.
> > >> I've run the tprof command twice and verified it.
> > >>
> > >> The full work though follows.
> > >>
> > >> The output from topas (same as top, effectively) is:
> > >>
> > >> Topas Monitor for host: XXXXXXXXXXXXXXX EVENTS/QUEUES
> > FILE/TTY
> > >> Wed Mar 19 14:49:55 2014 Interval: 2 Cswitch 3581
> > >> Readch 686
> > >> Syscall 2763
> Writech
> > >> 1378
> > >> CPU User% Kern% Wait% Idle% Physc Entc Reads 7
> > >> Rawin 0
> > >> ALL 48.8 1.2 0.0 50.1 1.03 51.7 Writes 5
> > >> Ttyout 643
> > >> Forks 0
> > >> Igets 0
> > >> Network KBPS I-Pack O-Pack KB-In KB-Out Execs 0
> > >> Namei 81
> > >> Total 2.8 10.0 7.5 1.1 1.7 Runqueue 1.0
> > >> Dirblk 0
> > >> Waitqueue 0.0
> > >> Disk Busy% KBPS TPS KB-Read KB-Writ
> MEMORY
> > >> Total 0.0 0.0 0.0 0.0 0.0 PAGING
> Real,MB
> > >> 20480
> > >> Faults 3 % Comp
> > >> 44
> > >> FileSystem KBPS TPS KB-Read KB-Writ Steals 0 %
> > Noncomp
> > >> 54
> > >> Total 0.6 1.5 0.6 0.0 PgspIn 0 %
> > Client
> > >> 54
> > >> PgspOut 0
> > >> Name PID CPU% PgSp Owner PageIn 0
> PAGING
> > >> SPACE
> > >> java 9437312 48.6 739.1 wasadmin PageOut 0
> Size,MB
> > >> 15552
> > >>
> > >> Process with a PID of 9437312 is the WebSphere instance that runs
> > Archiva,
> > >> the problem in question.
> > >>
> > >> We then use a native AIX tool, tprof to examine that process, and see
> > what
> > >> it's doing:
> > >>
> > >> tprof -j -P 9437312 -skex sleep 60
> > >>
> > >> This generates sleep.prof, and the relevant section is:
> > >>
> > >> Configuration information
> > >> =========================
> > >> System: AIX 6.1 Node: au02qap207teax2 Machine: 00C43D204C00
> > >> Tprof command was:
> > >> tprof -j -P 9437312 -skex sleep 60
> > >> Trace command was:
> > >> /usr/bin/trace -ad -M -L 2438933299 -T 500000 -j
> > >> 00A,001,002,003,38F,005,006,134,210,139,5A2,5A5,465,234,5D8, -o -
> > >> Total Samples = 24008
> > >> Traced Time = 60.02s (out of a total execution time of 60.02s)
> > >>
> > >>
> >
> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > >>
> > >> Process Freq Total Kernel User Shared
> Other
> > >> Java
> > >> ======= ==== ===== ====== ==== ======
> =====
> > >> ====
> > >> wait 4 64.29 64.29 0.00 0.00
> 0.00
> > >> 0.00
> > >> java 182 34.66 1.97 0.00 32.65
> 0.05
> > >> 0.00
> > >> /usr/sbin/syncd 2 0.17 0.17 0.00 0.00
> 0.00
> > >> 0.00
> > >> sshd: 1 0.14 0.09 0.05 0.01
> 0.00
> > >> 0.00
> > >> /usr/bin/ps 13 0.12 0.10 0.00 0.03
> 0.00
> > >> 0.00
> > >> /usr/bin/sh 27 0.12 0.11 0.00 0.00
> 0.00
> > >> 0.00
> > >> httpd 12 0.09 0.05 0.01 0.03
> 0.00
> > >> 0.00
> > >> /usr/bin/scp 1 0.09 0.07 0.00 0.01
> 0.00
> > >> 0.00
> > >> /opt/freeware/bin/readlink 6 0.04 0.04 0.00 0.00
> 0.00
> > >> 0.00
> > >> /usr/bin/tprof 1 0.03 0.00 0.01 0.02
> 0.00
> > >> 0.00
> > >> kulagent 7 0.03 0.02 0.00 0.00
> 0.00
> > >> 0.00
> > >> swapper 1 0.02 0.02 0.00 0.00
> 0.00
> > >> 0.00
> > >> /usr/bin/grep 5 0.02 0.02 0.00 0.00
> 0.00
> > >> 0.00
> > >> /usr/sbin/getty 1 0.02 0.02 0.00 0.00
> 0.00
> > >> 0.00
> > >> kuxagent 3 0.02 0.01 0.00 0.00
> 0.00
> > >> 0.00
> > >> lrud 1 0.02 0.02 0.00 0.00
> 0.00
> > >> 0.00
> > >> srmAIX 3 0.01 0.01 0.00 0.00
> 0.00
> > >> 0.00
> > >> /opt/tivoli/cit/bin/wscanuse 1 0.01 0.01 0.00 0.00
> 0.00
> > >> 0.00
> > >> psmd 1 0.01 0.01 0.00 0.00
> 0.00
> > >> 0.00
> > >> xmgc 1 0.01 0.01 0.00 0.00
> 0.00
> > >> 0.00
> > >> nfs_stat 2 0.01 0.01 0.00 0.00
> 0.00
> > >> 0.00
> > >> /usr/bin/topasrec 2 0.01 0.01 0.00 0.00
> 0.00
> > >> 0.00
> > >> /usr/bin/xmtopas 1 0.01 0.00 0.00 0.00
> 0.00
> > >> 0.00
> > >> kcawd 1 0.00 0.00 0.00 0.00
> 0.00
> > >> 0.00
> > >> /usr/sbin/snmpd 1 0.00 0.00 0.00 0.00
> 0.00
> > >> 0.00
> > >> /usr/sbin/snmpmibd 1 0.00 0.00 0.00 0.00
> 0.00
> > >> 0.00
> > >> /usr/bin/date 1 0.00 0.00 0.00 0.00
> 0.00
> > >> 0.00
> > >> /usr/bin/entstat 1 0.00 0.00 0.00 0.00
> 0.00
> > >> 0.00
> > >> gil 1 0.00 0.00 0.00 0.00
> 0.00
> > >> 0.00
> > >> /usr/bin/vmstat 1 0.00 0.00 0.00 0.00
> 0.00
> > >> 0.00
> > >> /usr/bin/ln 1 0.00 0.00 0.00 0.00
> 0.00
> > >> 0.00
> > >> stat_daemon 1 0.00 0.00 0.00 0.00
> 0.00
> > >> 0.00
> > >> j2pg 1 0.00 0.00 0.00 0.00
> 0.00
> > >> 0.00
> > >> /usr/sbin/netstat 1 0.00 0.00 0.00 0.00
> 0.00
> > >> 0.00
> > >> ======= ==== ===== ====== ==== ======
> =====
> > >> ====
> > >> Total 289 100.00 67.11 0.07 32.77
> 0.05
> > >> 0.00
> > >>
> > >> Process PID TID Total Kernel User Shared Other
> > >> Java
> > >> ======= === === ===== ====== ==== ====== =====
> > >> ====
> > >> java 9437312 46072057 23.76 0.05 0.00 23.70 0.00
> > >> 0.00
> > >>
> > >> Which gives me the needed TID, 46072057 = 0x2BF00F9.
> > >>
> > >> So to trigger a non-fatal heap dump:
> > >>
> > >> kill -3 9437312
> > >>
> > >> and then we look into the created
> > javacore.20140318.173250.9437312.0002.txt
> > >> file for references to 0x2BF00F9:
> > >>
> > >> 3XMTHREADINFO "AsyncLoggerConfig-1"
> J9VMThread:0x0000000031D14600,
> > >> j9thread_t:0x00000100137D8BD0, java/lang/Thread:0x000000004301C508,
> > >> state:CW, prio=5
> > >> 3XMJAVALTHREAD (java/lang/Thread getId:0x6A, isDaemon:true)
> > >> 3XMTHREADINFO1 (native thread ID:0x2BF00F9, native
> > priority:0x5,
> > >> native policy:UNKNOWN)
> > >> 3XMHEAPALLOC Heap bytes allocated since last GC cycle=0
> > (0x0)
> > >> 3XMTHREADINFO3 Java callstack:
> > >> 4XESTACKTRACE at sun/misc/Unsafe.park(Native Method)
> > >> 4XESTACKTRACE at
> > >> java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.java:332)
> > >> 4XESTACKTRACE at
> > >>
> > >>
> >
> com/lmax/disruptor/SleepingWaitStrategy.applyWaitMethod(SleepingWaitStrategy.java:66)
> > >> 4XESTACKTRACE at
> > >>
> > >>
> >
> com/lmax/disruptor/SleepingWaitStrategy.waitFor(SleepingWaitStrategy.java:39)
> > >> 4XESTACKTRACE at
> > >>
> > >>
> >
> com/lmax/disruptor/ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:55)
> > >> 4XESTACKTRACE at
> > >>
> com/lmax/disruptor/BatchEventProcessor.run(BatchEventProcessor.java:115)
> > >> 4XESTACKTRACE at
> > >>
> > >>
> >
> java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > >> 4XESTACKTRACE at
> > >>
> > >>
> >
> java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > >> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:773)
> > >> 3XMTHREADINFO3 Native callstack:
> > >> 4XENATIVESTACK _event_wait+0x2b8 (0x09000000007E7D3C
> > >> [libpthreads.a+0x16d3c])
> > >> 4XENATIVESTACK _cond_wait_local+0x4e4
> (0x09000000007F5A48
> > >> [libpthreads.a+0x24a48])
> > >> 4XENATIVESTACK _cond_wait+0xbc (0x09000000007F6020
> > >> [libpthreads.a+0x25020])
> > >> 4XENATIVESTACK pthread_cond_wait+0x1a8
> (0x09000000007F6C8C
> > >> [libpthreads.a+0x25c8c])
> > >> 4XENATIVESTACK (0x0900000001223014
> [libj9thr26.so+0x6014])
> > >> 4XENATIVESTACK (0x0900000001222C60
> [libj9thr26.so+0x5c60])
> > >> 4XENATIVESTACK (0x090000000116AE58
> [libj9vm26.so+0xfe58])
> > >> 4XENATIVESTACK (0x090000000116B17C
> [libj9vm26.so+0x1017c])
> > >> 4XENATIVESTACK (0x0900000001810528
> > >> [libjclscar_26.so+0x5c528])
> > >> 4XENATIVESTACK (0x0900000001813B98
> > >> [libjclscar_26.so+0x5fb98])
> > >> 4XENATIVESTACK (0x0900000001161764
> [libj9vm26.so+0x6764])
> > >> 4XENATIVESTACK (0x0900000001239CA0
> [libj9prt26.so+0x2ca0])
> > >> 4XENATIVESTACK (0x09000000011615D4
> [libj9vm26.so+0x65d4])
> > >> 4XENATIVESTACK (0x090000000121FAF4
> [libj9thr26.so+0x2af4])
> > >> 4XENATIVESTACK _pthread_body+0xf0 (0x09000000007D4D34
> > >> [libpthreads.a+0x3d34])
> > >> NULL
> > >>
> > >> Would you like me to attach the complete sleep.tprof and javacore.txt
> > file
> > >> to the Jira ticket that I just created?
> > >>
> > >> https://issues.apache.org/jira/browse/LOG4J2-571
> > >>
> > >> Thanks for looking, this has been driving me nuts.
> > >>
> > >> -Chris
> > >>
> > >>
> > >>
> > >> On Wed, Mar 19, 2014 at 2:45 PM, Remko Popma <[email protected]>
> > >> wrote:
> > >>
> > >>> Just to double-check: when using blocking wait, and running jstack,
> > does
> > >>> BlockingWaitStrategy appear in the stack trace?
> > >>> Also, it it possible to double-check (perhaps attach VisualVM) that
> it
> > >>> definitely is the AsyncLoggerConfig-1 thread that consumes so much
> CPU?
> > >>>
> > >>>
> > >>> Sent from my iPhone
> > >>>
> > >>>> On 2014/03/19, at 12:31, Chris Graham <[email protected]> wrote:
> > >>>>
> > >>>> I have tried both Block and Sleep (the default), but not Yield. No
> > >>>> discernable difference.
> > >>>>
> > >>>>
> > >>>>> On Wed, Mar 19, 2014 at 1:53 PM, Remko Popma <
> [email protected]>
> > >>> wrote:
> > >>>>>
> > >>>>> As tweeted, I suggest trying the blocking wait strategy. Can you
> run
> > a
> > >>>>> jstack dump (and perhaps attach result to a Jira ticket)? In the
> > >>> attached
> > >>>>> stack trace below, the AsyncLoggerConfig-1 thread seems to be
> parked,
> > >>>>> waiting for a new log event... Doesn't explain high CPU usage...
> > >>>>>
> > >>>>> Sent from my iPhone
> > >>>>>
> > >>>>>> On 2014/03/19, at 10:27, Chris Graham <[email protected]>
> wrote:
> > >>>>>>
> > >>>>>> Hello Everyone.
> > >>>>>>
> > >>>>>> In this instance, I'm in indirect used of log4j2 2.0-rc1, as it's
> in
> > >>> the
> > >>>>>> web app that I'm using, Apache Archiva 2.0.1.
> > >>>>>>
> > >>>>>> The issue is that when running under WebSphere 8.5.0.2 (obviously
> on
> > >>> the
> > >>>>>> IBM JDK, 1.6) on AIX 6.1 TL8, Apache Archiva when it's doing
> > nothing,
> > >>> is
> > >>>>>> sitting idle on around 50% CPU.
> > >>>>>>
> > >>>>>> Obviosuly, this is not good!
> > >>>>>>
> > >>>>>> I've performed the AIX native analysis, to get the native thread
> ID,
> > >>>>> mapped
> > >>>>>> it to a Java thread it, triggered a heap dump, and I've found this
> > as
> > >>> the
> > >>>>>> culprit:
> > >>>>>>
> > >>>>>> 3XMTHREADINFO "AsyncLoggerConfig-1"
> > >> J9VMThread:0x0000000031D14600,
> > >>>>>> j9thread_t:0x00000100137D8BD0,
> java/lang/Thread:0x000000004301C508,
> > >>>>>> state:CW, prio=5
> > >>>>>> 3XMJAVALTHREAD (java/lang/Thread getId:0x6A,
> > >> isDaemon:true)
> > >>>>>> 3XMTHREADINFO1 (native thread ID:0x2BF00F9, native
> > >>>>> priority:0x5,
> > >>>>>> native policy:UNKNOWN)
> > >>>>>> 3XMHEAPALLOC Heap bytes allocated since last GC
> cycle=0
> > >>> (0x0)
> > >>>>>> 3XMTHREADINFO3 Java callstack:
> > >>>>>> 4XESTACKTRACE at sun/misc/Unsafe.park(Native
> Method)
> > >>>>>> 4XESTACKTRACE at
> > >> java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.java:332)
> > >>>>>> 4XESTACKTRACE at
> > >>
> >
> com/lmax/disruptor/SleepingWaitStrategy.applyWaitMethod(SleepingWaitStrategy.java:66)
> > >>>>>> 4XESTACKTRACE
> > >
> > >
> > >
> > > --
> > > Matt Sicker <[email protected]>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>