[ 
https://issues.apache.org/jira/browse/HTTPCORE-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659942#action_12659942
 ] 

Marc Beyerle commented on HTTPCORE-155:
---------------------------------------

Hi all,

First off, a big "Thank You" for all your comments / concerns. They are all 
valid, and I will try to address them as thoroughly as I can:

- The root cause of the problem is not really related to performance per se, 
but to the fact that the IBM JVM fully blocks, when one thread is executing a 
select() operation and another thread tries to perform an interestOps() 
operation. This can be observed (for example) when running the Apache Synapse 
"Stock Quote" sample: It simply runs into a timeout after a while, so the issue 
really boils down to a "works / doesn't work" and not a "works faster / works 
slower" problem. We may now argue that the SelectionKey contract is improvable, 
but as a matter of fact the contract has been around for quite some time and we 
have to somehow address JVM implementations, which exhibit a runtime behavior 
like IBM's JVM. Although unlikely, even Sun could change its JVM's behavior in 
a future release.

- The basic idea behind my patch is exactly as outlined by Sam. First, I 
replicate interestOps() operations and store them in a queue. Second, both 
select() and interestOps() operations are executed by one and the same thread, 
in order to "naturally" avoid blocking situations by mutual exclusion. Although 
this sounds like quite some overhead at first, you can verify that it actually 
isn't when you execute some performance tests. The Sun JVM performs exactly the 
same amount of I/O operations in exactly the same amount of time, regardless of 
whether either my patch or the original 4.0-beta1 code is used.

- I originally changed the select() timeout down to 1ms, in order to process 
the interestOps() queue as quickly as possible and to avoid situations where a 
huge amount of interestOps() operations pile up. I also didn't like it, but 
couldn't think of any other way of achieving a timely processing of the queue. 
However, I changed the patch to include the idea outlined in one of the above 
comments: select() now has the original "selectTimeout" again, and I wakeup() 
the selector after enqueueing an interestOps() operation. Hence, when there is 
nothing to do, the I/O reactor sleeps. I performed my tests again, and it even 
slightly improved performance. Funny enough, although the 1ms loop seemed to be 
tight, it didn't consume CPU when I looked at my Linux system with "sadc".

- An extra thanks goes to Asankha, whose tests pointed me to something I forgot 
(took me quite some time to figure out): Of course I have to flush the 
interestOps() queue before returning the current interest set. Otherwise, the 
interest set may still be changed by an operation waiting in the queue after 
the supposed current value has been returned. I changed the patch accordingly 
and now the unit tests run correctly with the IBM JVM again. Funny that it 
doesn't occur with the Sun JVM - this affirms the assumption that we're also 
dealing with thread timing issues here.

Cheers,
Marc

> Performance issues with IBM JRE 6.0
> -----------------------------------
>
>                 Key: HTTPCORE-155
>                 URL: https://issues.apache.org/jira/browse/HTTPCORE-155
>             Project: HttpComponents HttpCore
>          Issue Type: Bug
>          Components: HttpCore NIO
>    Affects Versions: 4.0-beta1
>         Environment: Windows 2003 SP2 - IBM J2RE 1.6.0 build 2.4 - HTTPCore 
> Beta1 - Dual Core CPU 3.0Ghz - 1Gbps networking
>            Reporter: Tom McSorley
>             Fix For: 4.1
>
>         Attachments: AbstractIOReactor.diff, AbstractIOReactor.java, 
> IOSessionImpl.diff, IOSessionImpl.java, 
> javacore.20081203.153723.32300.0001.txt, myoutput.txt, patch.08-12-17.tar.gz, 
> patch.08-12-18.tar.gz, patch.08-12-22.tar.gz, patch.08-12-30.tar.gz
>
>
> I'm issuing a second HTTP Request on a connection that has very recently 
> returned a null for the submitRequest() call...  this 2nd request is being 
> issued approximately 500ms after the submitRequest() null is returned... so 
> the connection has just been established, an HTTP Request/Response-200 cycle 
> has completed just prior to this 2nd request being issued.  I'm seeing 
> unusually long delays in the requestOutput() call (verified by surrounding 
> timing prints)... that can range anywhere from a few milliseconds on up to 60 
> seconds...  It eventually unwinds, and then the submitRequest() is called... 
> this 2nd request is dispatched and works fine... but, it is delayed 
> considerably...  Is this a known issue and is there a possible work-around?
> Here's the JVM related thread information:
> The thread being delayed and stuck in the requestOutput() call for a long 
> time (mostly longer than 5 seconds):
> 3XMTHREADINFO      "pool-2-thread-5" TID:0x2AEECE00, j9thread_t:0x2A7189A8, 
> state:B, prio=5
> 3XMTHREADINFO1            (native thread ID:0x1B44, native priority:0x5, 
> native policy:UNKNOWN)
> 4XESTACKTRACE          at 
> sun/nio/ch/SelectionKeyImpl.interestOps(SelectionKeyImpl.java:60)
> 4XESTACKTRACE          at 
> org/apache/http/impl/nio/reactor/IOSessionImpl.setEvent(IOSessionImpl.java:113)
> 4XESTACKTRACE          at 
> org/apache/http/impl/nio/NHttpConnectionBase.requestOutput(NHttpConnectionBase.java:158)
> .... (non important stack information removed)
> 4XESTACKTRACE          at 
> java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
> 4XESTACKTRACE          at java/lang/Thread.run(Thread.java:735)
> Here's the monitor that this thread is blocked and waiting on:
> 2LKMONINUSE      sys_mon_t:0x2A708AF8 infl_mon_t: 0x2A708B30:
> 3LKMONOBJECT       sun/nio/ch/uti...@00b09208/00B09214: Flat locked by "I/O 
> dispatcher 7" (0x2A208E00), entry count 1
> 3LKWAITERQ            Waiting to enter:
> 3LKWAITER                "pool-2-thread-5" (0x2AEECE00)
> And here's the thread that currently has this monitor locked:
> 3XMTHREADINFO      "I/O dispatcher 7" TID:0x2A208E00, j9thread_t:0x2A6EC73C, 
> state:R, prio=5
> 3XMTHREADINFO1            (native thread ID:0x830, native priority:0x5, 
> native policy:UNKNOWN)
> 4XESTACKTRACE          at 
> sun/nio/ch/WindowsSelectorImpl$SubSelector.poll0(Native Method)
> 4XESTACKTRACE          at 
> sun/nio/ch/WindowsSelectorImpl$SubSelector.poll(WindowsSelectorImpl.java:308(Compiled
>  Code))
> 4XESTACKTRACE          at 
> sun/nio/ch/WindowsSelectorImpl$SubSelector.access$500(WindowsSelectorImpl.java(Compiled
>  Code))
> 4XESTACKTRACE          at 
> sun/nio/ch/WindowsSelectorImpl.doSelect(WindowsSelectorImpl.java:162(Compiled 
> Code))
> 4XESTACKTRACE          at 
> sun/nio/ch/SelectorImpl.lockAndDoSelect(SelectorImpl.java:69(Compiled Code))
> 4XESTACKTRACE          at 
> sun/nio/ch/SelectorImpl.select(SelectorImpl.java:80(Compiled Code))
> 4XESTACKTRACE          at 
> org/apache/http/impl/nio/reactor/AbstractIOReactor.execute(AbstractIOReactor.java:121)
> 4XESTACKTRACE          at 
> org/apache/http/impl/nio/reactor/BaseIOReactor.execute(BaseIOReactor.java:70)
> 4XESTACKTRACE          at 
> org/apache/http/impl/nio/reactor/AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:318)
> 4XESTACKTRACE          at java/lang/Thread.run(Thread.java:735)
> I should also note that we're attempting to use 1000 client instances on this 
> single system... each with potentially 2 active connections simultaneously... 
> there is also virtually no CPU load (i.e. less then 5%) on this system...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to