The short version:

I claim that in Init.instance()

this line:

  | Init init = (Init) Contexts.getApplicationContext().get(Init.class);
  | 

should be replaced with :

  | Init init = (Init) 
Contexts.getApplicationContext().get("org.jboss.seam.core.init")
  | 

because the current version calles Class.getAnnotation which in turns calls 
synchronized method and this results in poor behavior under heavy load.


The long version:

First some history (and a similar previous problem):

Over the past few months I've been working with a team converting an existing 
webapp of moderate complexity  to a JSF / Seam stack w/ Facelets.

It's been in production now for about a month and a half.
Running on jboss 4.0.5.GA  w/ seam 1.2.0.patch1  Sun JSF RI  1.2 b4 (?)

Before we put it in production had to do some performance tuning.  What we 
started out with was about 10% as fast as the original app , which wasn't 
acceptable, but eventually we got to about 60%, which was fine.

One undesirable behavior was that under moderately heavy load, when requests 
were slowing down from their normal ~300ms to 3-4 seconds, the distribution of 
request response time got very wide with some requests taking 5 seconds while 
other, identical, requests were timing out at 5 minutes.

With some testing we tracked this down to the fact that 
org/apache/catalina/core/ApplicationContext.get() is being called something 
like 50K - 100K times per request in our app.  Internally this call is 
synchronized. 

The solution we tried and eventually adopted was to edit  the tomcat 
ApplicationContext code and replace the HashMap with a  ConcurrentHashMap and 
to remove the synchronization in the get().  

This resulted in something like a 30-40% performance improvement (if memory 
serves), but more importantly it significantly narrowed the response time 
curve.  That is, the response time for requests ,under load, was a lot more 
consistent.
(note 1: this makes sense since having lots of threads contending for a lock 
can lead to starvation.  In this case, the fact that any request has to "win" 
the lock 100K times before it succeeds makes it more likely that starvation 
will occur despite any anti-starvation attempts by the OS/VM)

(Note 2: This change not strictly speaking a safe thing to do,  but it works 
for us.  (Tomcat 6 also takes this approach but presumably they've gone further 
in  validating that the rest of the app is safe with the concurrent hash map 
semantics)).

THE CURRENT ISSUE:

Basically, in production under heavy load, we saw the same distribution of 
request response time. (measured by mod-log-firstbyte in apache).    Bascially 
once the per-page response got to ~7-10 seconds / page some requests start to 
take 100 - 300 seconds  (there's some timeout in the system that caps things at 
300 seconds).  The situation then often spirals out of control with active 
connections climbing to the 100s and inevitably resulting in an OOM.

This time a thread-dump of jboss showed that many threads were waiting in to 
lock a specific lock in  java.lang.Class.initAnnotationsIfNecessary().

Looking at the stack traces it became clear that the Class in question was 
Init.class and that all these threads were calling   Init.instance().
( Init.instance() is called ... everywhere... all the time).

The specific line that leads to the eventual lock contention is this one:
Init init = (Init) Contexts.getApplicationContext().get(Init.class);

Which internally leads to a call to get the value of the @Name annotation for 
the Init.class so that it can then look up the instance by that name.

It is this lookup of the Name of the Init component that causes the contention.

Replacing the get(Init.class) with get("org.jboss.seam.core.init") bypasses 
this lookup and also that synchronized method.

I've been running this modification on half of the servers for the past two 
days and  they are showing MUCH more linear performance under heavy load.  
Basically when subjected to a stressor (GC pause, DB related pause, traffic 
spike etc ) the modified machines return to normal much more quickly (say... 20 
-30 seconds  compared to 5-10 minutes)  and they haven't shown a tendency to 
spiral out of control towards OOM.

I  generally like the compile time checking that one gets from using the 
get(Class) version of this method, however in this specific case i think  the 
loss of safety is worth it given the deep loop nature of this call.


I'm curious as to what you guys think.

Thanks,
radu


View the original post : 
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4042697#4042697

Reply to the post : 
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4042697
_______________________________________________
jboss-user mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/jboss-user

Reply via email to