Hi Daryl, I've got your dump by mail. It is quite huge. JRockit writes the essential things at the bottom of the dump. So the best is to start reading there. What you can see that there are 162 threads being locked - so nearlly as many request are not responding or responding very slow. You do not have a JVM deadlock situation, otherwise JRockit would recoginze and write that at the bottom of the dump. Now you can only see "Blocked lock chains" and "Open lock chains".
Basically you see in your dump that there are only three threads, who are holding the locks (Ajp13Processor[8009][204] , Ajp13Processor[8009][137] and Ajp13Processor[8009][173]) All other threads are waiting for the locks to be released. So now you can go up there and see the stack trace of these three threads and you try to figure out what is happening.
From the chains at the bottom you can also see that the biggest blocker for
whom most of the threads are waiting is Ajp13Processor[8009][204]. In the thread dump you see this: "Ajp13Processor[8009][204]" id=661 idx=0x278 tid=23430 prio=5 alive, *in native, native_blocked, daemon at jrockit/vm/Allocator.nativeGetNewTLA()I(Native Method) at jrockit/vm/Allocator.getNewTLAAndAlloc(IIIZ)Ljava/lang/Object;(Unknown Source)[inlined] *at jrockit/vm/Allocator.getMoreMemoryAndAlloc(IIIIZ)Ljava/lang/Object;(Unknown Source)[optimized] at java/util/Hashtable.getEnumeration(I)Ljava/util/Enumeration;(Unknown Source)[inlined] *at java/util/Hashtable.elements()Ljava/util/Enumeration;(Unknown Source)[inlined] at org/jahia/services/usermanager/JahiaGroup.isMember(JahiaGroup.java :262)[optimized] ^-- Holding lock: java/util/[EMAIL PROTECTED] lock] *at org/jahia/services/acl/JahiaACL.getUserPermissionInGroupHashtable( JahiaACL.java:953)[inlined] JRockit tries to allocate memory from the heap and is natively blocked, because the JVM still tries to get some memory by garbage collecting. This situation is slowly leading to an OutOfMemory exception, but you made the dump before that. About 100 threads are directly waiting for the lock created by the Hashtable.elements() in JahiaGroup.isMember(), which is a synchroinzed method. In fact ALL threads are waiting for that, because also the ones who say -- Blocked trying to get lock: org/jahia/services/fields/[EMAIL PROTECTED] lock] do indirectly also wait for the lock in JahiaGroup.isMember(), because the lock in JahiaFieldBaseService is being held by thread Ajp13Processor[8009][137] and that thread itself also waits for the ONE Hastable.elements() to return, which does not get memory: "Ajp13Processor[8009][137]" id=594 idx=0x1f2 tid=23264 prio=5 alive, in native, blocked, daemon *-- Blocked trying to get lock: java/util/[EMAIL PROTECTED] lock] *at jrockit/vm/Threads.waitForSignal(J)Z(Native Method) at jrockit/vm/Locks.fatLockBlockOrSpin(ILjrockit/vm/ObjectMonitor;II)V(Unknown Source)[optimized] at jrockit/vm/Locks.lockFat(Ljava/lang/Object;ILjrockit/vm/ObjectMonitor;Z)Ljava/lang/Object;(Unknown Source)[optimized] at jrockit/vm/Locks.monitorEnterSecondStage(Ljava/lang/Object;I)Ljava/lang/Object;(Unknown Source)[optimized] *at java/util/Hashtable.elements()Ljava/util/Enumeration;(Unknown Source)[inlined] at org/jahia/services/usermanager/JahiaGroup.isMember(JahiaGroup.java :262)[optimized] *at org/jahia/services/acl/JahiaACL.getPermission(JahiaACL.java :718)[inlined] at org/jahia/services/acl/JahiaACL.getPermission(JahiaACL.java :778)[optimized] at org/jahia/services/acl/JahiaAbstractACL.getPermission( JahiaAbstractACL.java:591)[inlined] at org/jahia/data/fields/JahiaField.checkAccess(JahiaField.java :378)[inlined] at org/jahia/data/fields/JahiaField.checkReadAccess(JahiaField.java :324)[inlined] *at org/jahia/services/fields/JahiaFieldBaseService.loadField( JahiaFieldBaseService.java:777)[inlined] at org/jahia/services/containers/ContainerFactory.fullyLoadContainer( ContainerFactory.java:148)[optimized] ^-- Holding lock: org/jahia/services/fields/[EMAIL PROTECTED] lock] *at org/jahia/services/containers/ContainerFactoryProxy.load( ContainerFactoryProxy.java:129)[inlined] And then there is the lock in Ajp13Processor[8009][173]: "Ajp13Processor[8009][173]" id=630 idx=0x23a tid=23360 prio=5 alive, in native, blocked, daemon *-- Blocked trying to get lock: org/jahia/services/fields/[EMAIL PROTECTED] lock] *at jrockit/vm/Threads.waitForSignal(J)Z(Native Method) at jrockit/vm/Locks.fatLockBlockOrSpin(ILjrockit/vm/ObjectMonitor;II)V(Unknown Source)[optimized] at jrockit/vm/Locks.lockFat(Ljava/lang/Object;ILjrockit/vm/ObjectMonitor;Z)Ljava/lang/Object;(Unknown Source)[optimized] at jrockit/vm/Locks.monitorEnterSecondStage(Ljava/lang/Object;I)Ljava/lang/Object;(Unknown Source)[optimized] at jrockit/vm/Locks.monitorEnter(Ljava/lang/Object;)Ljava/lang/Object;(Unknown Source)[optimized] *at org/jahia/services/fields/JahiaFieldBaseService.saveFieldDefinition( JahiaFieldBaseService.java:1478) at org/jahia/registries/JahiaFieldDefinitionsRegistry.setDefinition( JahiaFieldDefinitionsRegistry.java:156) ^-- Holding lock: org/jahia/registries/[EMAIL PROTECTED] lock]* at org/jahia/data/containers/JahiaContainerSet.declareField( JahiaContainerSet.java:280)[optimized] at org/jahia/taglibs/declarations/DeclareFieldTag.doStartTag( DeclareFieldTag.java:114)[optimized] In this stacktrace you can see that this thread is holding the lock for the synchronized method in JahiaFieldDeinitionsRegistry.setDefinition, but the thread itself waits for the JahiaFieldBaseService lock, which is being held in thread 137, which waits for the lock in thread 204. In the forthcoming Jahia 4.2 we already have multiple refactorings to prevent these situations. We have removed many synchronized methods in order to not overlock the application, when it is not necessary. We are also releasing database connections for the waiting threads, in order to not reserve too many connections, for threads, who are waiting anyway. Thus we will also release memory. The cases in your thread dump are already known and covered. If you have more thread dumps, please send them again, just to go sure that there is no situation, which we have not seen until now. What you could do so far: If this customer has not been updated recently with a new release of your templates, then the call to *JahiaFieldBaseService.saveFieldDefinition *should not really happen too often. Please put a breakpoint in there on debugging or set the logger for *JahiaFieldDefinitionsRegistry *to debug level. If the field definition is being saved too often, then there is something wrong with the way, how you definie the fields. Fixing this, will lead to lesser locks. You could also - until Jahia 4.2 - try limiting the amount of requests Apache passes to Tomcat, and return a server busy error code or message. This way Tomcat will not be bombed with requests and not run out of memory as easy, as for now all those threads are at least opening a database connection and not closing it before waiting for the lock. In Jahia 4.2 most of those threads will release the connections. Hope I could help you, Benjamin 2007/2/21, Daryl Gerlach <[EMAIL PROTECTED]>:
I found the output it's in the catalina.out. How do I make sense of the output? I see several entries like the following: -- Blocked trying to get lock: org/jahia/services/fields/[EMAIL PROTECTED] lock] Where can I forward the zip file of the thread dump contents? Daryl Daryl Gerlach wrote: Benjamin, I tried the Kill -QUIT <pid> But it didn't work, process kept running with no output. I'm running Red Hat ES4. Confused Daryl Benjamin Papez wrote: Hello Daryl, we had similar problems, because Jahia 4.1 was running out of database connections. There were also some deadlock issues. Most of these problems will be fixed with the forthcoming Jahia 4.2 release, which will be out very soon. Meanwhile you could try to reserve enough DB connections. What you could also do is to create a thread dump ( kill -QUIT *process_id*) of the tomcat java process, when such a situation occurs. This will help determining, what the threads are doing and why they are not finishing their work. Greetings, Benjamin 2007/2/6, Daryl Gerlach <[EMAIL PROTECTED]>: > > We are having issues running Jahia 4.1.1 in production. We recently > upgraded our servers to Red Hat ES 4 and added apache 2.0.52 with mod_jk > 1.2.20 to front end the whole thing. Under load the server stops > responding to requests. Here are the settings for workers.properties, > server.xml, and httpd.conf > > worker.properties > ============ > worker.list=consumer,builder > > # Set properties for consumer (ajp13) > worker.consumer.type=ajp13 > worker.consumer.host=localhost > worker.consumer.port=8019 > > worker.builder.type=ajp13 > worker.builder.host=localhost > worker.builder.port=8009 > > > httpd.conf > ======= > <IfModule mod_jk.c> > JkWorkersFile conf/workers.properties > JkShmFile logs/mod_jk.shm > JkLogFile logs/mod_jk.log > JkLogLevel info > JkLogStampFormat "[%a %b %d %H:%M:%S %Y] " > </IfModule> > > <VirtualHost *:80> > ... > <IfModule mod_jk.c> > # Send everything for context /examples to worker named worker1 > (ajp13) > JkMount /jahia/* builder > JkMount /pdfgenerator/* builder > </IfModule> > </VirtualHost> > > > server.xml > ======= > <Connector className="org.apache.ajp.tomcat4.Ajp13Connector " > port="8009" minProcessors="5" maxProcessors="75" > acceptCount="10" debug="0"/> > > What settings are other people using in production? > > -- Daryl Gerlach Consultant Collective Intelligence, Inc. Phone: 717-545-9234 Fax: 717-545-9232 Cell: 717-649-3499 -- Daryl Gerlach Consultant Collective Intelligence, Inc. Phone: 717-545-9234 Fax: 717-545-9232 Cell: 717-649-3499