[ 
https://issues.apache.org/jira/browse/SLING-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carsten Ziegeler reassigned SLING-2719:
---------------------------------------

    Assignee: Carsten Ziegeler
    
> Deadlock in ResourceResolverFactoryActivator.checkFactoryPreconditions
> ----------------------------------------------------------------------
>
>                 Key: SLING-2719
>                 URL: https://issues.apache.org/jira/browse/SLING-2719
>             Project: Sling
>          Issue Type: Bug
>          Components: ResourceResolver
>    Affects Versions: Resource Resolver 1.0.2
>         Environment: JBoss
>            Reporter: Chetan Mehrotra
>            Assignee: Carsten Ziegeler
>              Labels: deadlock
>         Attachments: error-log-threaddump.zip
>
>
> We are seeing intermittent issues of deadlock while running a Sling based 
> webapp in an app server like JBoss. The deadlock is being seen between the 
> FelixFrameworkWiring and FelixStartLevel threads. 
> For example analyzing the order of locks taken in the threaddump-1.log (shown 
> below). Here the FelixFrameworkWiring thread has the Global bundle lock at 
> Felix level [1] and is waiting for the lock in 
> ResourceResolverFactoryActivator.checkFactoryPreconditions. While the 
> FelixStartLevel thread has the lock on RRF and is waiting for global lock. 
> Thus resulting in a deadlock
> The FelixFrameworkWiring [5] is busy in deactivating components because of a 
> package refresh earlier (which lead to repository getting shutdown and thus 
> triggering deactivation of ResourceResolverFactoryActivator). While the 
> FelixStartLevel [6] thread has activated ResourceResolverFactoryActivator 
> (thus hold the lock) and later requires global lock for some operation.
> Looking at the code for 
> ResourceResolverFactoryActivator.checkFactoryPreconditions [2] it appears to 
> take and hold a lock (on this) while making a call to OSGi container. Such a 
> usage *might* cause issues like deadlock. So it would be better if the 
> ResourceResolverFactoryActivator does not hold any lock while making the call 
> to container services [3]
> "FelixFrameworkWiring"
> - locked <0x00000007944da478> (a java.util.concurrent.atomic.AtomicReference) 
> org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
> - locked <0x00000007944da9b0> (a java.util.concurrent.atomic.AtomicReference) 
> org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
> - locked <0x00000007944dae38> (a java.util.concurrent.atomic.AtomicReference) 
> org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
> - locked <0x0000000796d5d030> (a java.util.concurrent.atomic.AtomicReference) 
> org.apache.felix.scr.impl.manager.AbstractComponentManager.unregisterComponentService(AbstractComponentManager.java:702)
> - waiting to lock <0x000000079624ff08> (a 
> org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator) 
> org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.checkFactoryPreconditions(ResourceResolverFactoryActivator.java:330)
> "FelixStartLevel"
> - locked <0x000000079624ff08> (a 
> org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator) 
> org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator.checkFactoryPreconditions(ResourceResolverFactoryActivator.java:324)
> - locked <0x0000000796959bc8> (a java.util.concurrent.atomic.AtomicReference) 
> org.apache.felix.scr.impl.manager.AbstractComponentManager.registerService(AbstractComponentManager.java:660)
> - locked <0x0000000796959eb8> (a java.util.concurrent.atomic.AtomicReference) 
> org.apache.felix.scr.impl.manager.AbstractComponentManager.registerService(AbstractComponentManager.java:660)
> - locked <0x000000079695a188> (a java.util.concurrent.atomic.AtomicReference) 
> org.apache.felix.scr.impl.manager.AbstractComponentManager.registerService(AbstractComponentManager.java:660)
> - waiting <0x000000079415eca0> (a [Ljava.lang.Object;) 
> org.apache.felix.framework.Felix.acquireGlobalLock(Felix.java:5019)
> [1] This has been confirmed via the value for m_globalLockThread of Felix 
> instance in Heap Dump
> [2] 
> https://github.com/apache/sling/blob/trunk/bundles/resourceresolver/src/main/java/org/apache/sling/resourceresolver/impl/ResourceResolverFactoryActivator.java#L313
> [3] http://njbartlett.name/files/osgibook_preview_20091217.pdf (Section 6.4 
> Don’t Hold Locks when Calling Foreign Code)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to