[
https://issues.apache.org/jira/browse/HTTPCORE-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659942#action_12659942
]
Marc Beyerle commented on HTTPCORE-155:
---------------------------------------
Hi all,
First off, a big "Thank You" for all your comments / concerns. They are all
valid, and I will try to address them as thoroughly as I can:
- The root cause of the problem is not really related to performance per se,
but to the fact that the IBM JVM fully blocks, when one thread is executing a
select() operation and another thread tries to perform an interestOps()
operation. This can be observed (for example) when running the Apache Synapse
"Stock Quote" sample: It simply runs into a timeout after a while, so the issue
really boils down to a "works / doesn't work" and not a "works faster / works
slower" problem. We may now argue that the SelectionKey contract is improvable,
but as a matter of fact the contract has been around for quite some time and we
have to somehow address JVM implementations, which exhibit a runtime behavior
like IBM's JVM. Although unlikely, even Sun could change its JVM's behavior in
a future release.
- The basic idea behind my patch is exactly as outlined by Sam. First, I
replicate interestOps() operations and store them in a queue. Second, both
select() and interestOps() operations are executed by one and the same thread,
in order to "naturally" avoid blocking situations by mutual exclusion. Although
this sounds like quite some overhead at first, you can verify that it actually
isn't when you execute some performance tests. The Sun JVM performs exactly the
same amount of I/O operations in exactly the same amount of time, regardless of
whether either my patch or the original 4.0-beta1 code is used.
- I originally changed the select() timeout down to 1ms, in order to process
the interestOps() queue as quickly as possible and to avoid situations where a
huge amount of interestOps() operations pile up. I also didn't like it, but
couldn't think of any other way of achieving a timely processing of the queue.
However, I changed the patch to include the idea outlined in one of the above
comments: select() now has the original "selectTimeout" again, and I wakeup()
the selector after enqueueing an interestOps() operation. Hence, when there is
nothing to do, the I/O reactor sleeps. I performed my tests again, and it even
slightly improved performance. Funny enough, although the 1ms loop seemed to be
tight, it didn't consume CPU when I looked at my Linux system with "sadc".
- An extra thanks goes to Asankha, whose tests pointed me to something I forgot
(took me quite some time to figure out): Of course I have to flush the
interestOps() queue before returning the current interest set. Otherwise, the
interest set may still be changed by an operation waiting in the queue after
the supposed current value has been returned. I changed the patch accordingly
and now the unit tests run correctly with the IBM JVM again. Funny that it
doesn't occur with the Sun JVM - this affirms the assumption that we're also
dealing with thread timing issues here.
Cheers,
Marc
> Performance issues with IBM JRE 6.0
> -----------------------------------
>
> Key: HTTPCORE-155
> URL: https://issues.apache.org/jira/browse/HTTPCORE-155
> Project: HttpComponents HttpCore
> Issue Type: Bug
> Components: HttpCore NIO
> Affects Versions: 4.0-beta1
> Environment: Windows 2003 SP2 - IBM J2RE 1.6.0 build 2.4 - HTTPCore
> Beta1 - Dual Core CPU 3.0Ghz - 1Gbps networking
> Reporter: Tom McSorley
> Fix For: 4.1
>
> Attachments: AbstractIOReactor.diff, AbstractIOReactor.java,
> IOSessionImpl.diff, IOSessionImpl.java,
> javacore.20081203.153723.32300.0001.txt, myoutput.txt, patch.08-12-17.tar.gz,
> patch.08-12-18.tar.gz, patch.08-12-22.tar.gz, patch.08-12-30.tar.gz
>
>
> I'm issuing a second HTTP Request on a connection that has very recently
> returned a null for the submitRequest() call... this 2nd request is being
> issued approximately 500ms after the submitRequest() null is returned... so
> the connection has just been established, an HTTP Request/Response-200 cycle
> has completed just prior to this 2nd request being issued. I'm seeing
> unusually long delays in the requestOutput() call (verified by surrounding
> timing prints)... that can range anywhere from a few milliseconds on up to 60
> seconds... It eventually unwinds, and then the submitRequest() is called...
> this 2nd request is dispatched and works fine... but, it is delayed
> considerably... Is this a known issue and is there a possible work-around?
> Here's the JVM related thread information:
> The thread being delayed and stuck in the requestOutput() call for a long
> time (mostly longer than 5 seconds):
> 3XMTHREADINFO "pool-2-thread-5" TID:0x2AEECE00, j9thread_t:0x2A7189A8,
> state:B, prio=5
> 3XMTHREADINFO1 (native thread ID:0x1B44, native priority:0x5,
> native policy:UNKNOWN)
> 4XESTACKTRACE at
> sun/nio/ch/SelectionKeyImpl.interestOps(SelectionKeyImpl.java:60)
> 4XESTACKTRACE at
> org/apache/http/impl/nio/reactor/IOSessionImpl.setEvent(IOSessionImpl.java:113)
> 4XESTACKTRACE at
> org/apache/http/impl/nio/NHttpConnectionBase.requestOutput(NHttpConnectionBase.java:158)
> .... (non important stack information removed)
> 4XESTACKTRACE at
> java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:735)
> Here's the monitor that this thread is blocked and waiting on:
> 2LKMONINUSE sys_mon_t:0x2A708AF8 infl_mon_t: 0x2A708B30:
> 3LKMONOBJECT sun/nio/ch/uti...@00b09208/00B09214: Flat locked by "I/O
> dispatcher 7" (0x2A208E00), entry count 1
> 3LKWAITERQ Waiting to enter:
> 3LKWAITER "pool-2-thread-5" (0x2AEECE00)
> And here's the thread that currently has this monitor locked:
> 3XMTHREADINFO "I/O dispatcher 7" TID:0x2A208E00, j9thread_t:0x2A6EC73C,
> state:R, prio=5
> 3XMTHREADINFO1 (native thread ID:0x830, native priority:0x5,
> native policy:UNKNOWN)
> 4XESTACKTRACE at
> sun/nio/ch/WindowsSelectorImpl$SubSelector.poll0(Native Method)
> 4XESTACKTRACE at
> sun/nio/ch/WindowsSelectorImpl$SubSelector.poll(WindowsSelectorImpl.java:308(Compiled
> Code))
> 4XESTACKTRACE at
> sun/nio/ch/WindowsSelectorImpl$SubSelector.access$500(WindowsSelectorImpl.java(Compiled
> Code))
> 4XESTACKTRACE at
> sun/nio/ch/WindowsSelectorImpl.doSelect(WindowsSelectorImpl.java:162(Compiled
> Code))
> 4XESTACKTRACE at
> sun/nio/ch/SelectorImpl.lockAndDoSelect(SelectorImpl.java:69(Compiled Code))
> 4XESTACKTRACE at
> sun/nio/ch/SelectorImpl.select(SelectorImpl.java:80(Compiled Code))
> 4XESTACKTRACE at
> org/apache/http/impl/nio/reactor/AbstractIOReactor.execute(AbstractIOReactor.java:121)
> 4XESTACKTRACE at
> org/apache/http/impl/nio/reactor/BaseIOReactor.execute(BaseIOReactor.java:70)
> 4XESTACKTRACE at
> org/apache/http/impl/nio/reactor/AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:318)
> 4XESTACKTRACE at java/lang/Thread.run(Thread.java:735)
> I should also note that we're attempting to use 1000 client instances on this
> single system... each with potentially 2 active connections simultaneously...
> there is also virtually no CPU load (i.e. less then 5%) on this system...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]