Unico Hommes created JCR-3457:
---------------------------------
Summary: Possible deadlock during external update
Key: JCR-3457
URL: https://issues.apache.org/jira/browse/JCR-3457
Project: Jackrabbit Content Repository
Issue Type: Bug
Affects Versions: 2.4.3, 2.2.13
Reporter: Unico Hommes
During an external update ItemStateListeners are called while a write lock is
held. See SharedItemStateManager#externalUpdate:1548 ->
SharedItemStateManager#doExternalUpdate:1616-> ChangeLog#persisted:303,307,312
-> etc
One of these ItemStateListeners is CachingHierarchyManager and it sets a
regular java lock on 'cacheMonitor' during many of its notify methods (for
instance CachingHierarchyManager#nodeRemoved:467), while at the same time,
during holding that lock it may make calls to read item states (for instance
CachingHierarchyManager#nodeRemoved:748).
These notify methods are not only called during external updates but also
during regular session and node operations. For instance
NodeImpl.removeChildNode:642 -> NodeState#removeChildNodeEntry:447 ->
NodeState#notifyNodeRemoved:911 -> etc.
The scenario is then as follows.
1. A client calls such a regular session or node operation, for instance remove
a property from a node. The stack of that thread looks something like this
(line numbers of JR 2.2.13):
at java.lang.Object.wait(Native Method)
- waiting on <0xe9f3d3d0> (a EDU.oswego.cs.dl.util.concurrent.Latch)
at java.lang.Object.wait(Object.java:485)
at EDU.oswego.cs.dl.util.concurrent.Latch.acquire(Unknown Source)
- locked <0xe9f3d3d0> (a EDU.oswego.cs.dl.util.concurrent.Latch)
at
org.apache.jackrabbit.core.state.FineGrainedISMLocking.acquireReadLock(FineGrainedISMLocking.java:123)
at
org.apache.jackrabbit.core.state.SharedItemStateManager.acquireReadLock(SharedItemStateManager.java:1856)
at
org.apache.jackrabbit.core.state.SharedItemStateManager.getItemState(SharedItemStateManager.java:261)
at
org.apache.jackrabbit.core.state.LocalItemStateManager.getNodeState(LocalItemStateManager.java:109)
at
org.hippoecm.repository.jackrabbit.HippoLocalItemStateManager.getNodeState(HippoLocalItemStateManager.java:432)
at
org.apache.jackrabbit.core.state.LocalItemStateManager.getItemState(LocalItemStateManager.java:174)
- locked <0xe0382110> (a
org.hippoecm.repository.jackrabbit.HippoLocalItemStateManager)
at
org.apache.jackrabbit.core.state.ForkedXAItemStateManager.getItemState(ForkedXAItemStateManager.java:261)
at
org.hippoecm.repository.jackrabbit.HippoLocalItemStateManager.getCanonicalItemState(HippoLocalItemStateManager.java:328)
at
org.hippoecm.repository.dataprovider.HippoVirtualProvider.getCanonicalNodeState(HippoVirtualProvider.java:214)
at
org.hippoecm.repository.jackrabbit.FacetResultSetProvider.populate(FacetResultSetProvider.java:251)
at
org.hippoecm.repository.dataprovider.HippoNodeId.populate(HippoNodeId.java:85)
at
org.hippoecm.repository.jackrabbit.HippoLocalItemStateManager.getItemState(HippoLocalItemStateManager.java:351)
at
org.apache.jackrabbit.core.state.SessionItemStateManager.getItemState(SessionItemStateManager.java:161)
at
org.apache.jackrabbit.core.HierarchyManagerImpl.getItemState(HierarchyManagerImpl.java:152)
at
org.apache.jackrabbit.core.CachingHierarchyManager.nodeRemoved(CachingHierarchyManager.java:748)
at
org.apache.jackrabbit.core.CachingHierarchyManager.nodeRemoved(CachingHierarchyManager.java:472)
- locked <0xe04ed0e8> (a java.lang.Object)
at
org.apache.jackrabbit.core.state.StateChangeDispatcher.notifyNodeRemoved(StateChangeDispatcher.java:191)
at
org.apache.jackrabbit.core.state.SessionItemStateManager.nodeRemoved(SessionItemStateManager.java:915)
at
org.apache.jackrabbit.core.state.StateChangeDispatcher.notifyNodeRemoved(StateChangeDispatcher.java:191)
at
org.apache.jackrabbit.core.state.LocalItemStateManager.nodeRemoved(LocalItemStateManager.java:606)
at
org.apache.jackrabbit.core.state.NodeState.notifyNodeRemoved(NodeState.java:906)
at
org.apache.jackrabbit.core.state.NodeState.removeChildNodeEntry(NodeState.java:443
Notice that it succeeded in obtaining a lock on
CachingHierarchyManager.cacheMonitor but is now waiting on a read lock in the
SharedItemStateManager.
2. This is because at the same time an external update is taking place. The
stack of that thread looks something like (line numbers of JR 2.2.13):
"jackrabbit-pool-2" daemon prio=10 tid=0xcdbd5400 nid=0x7463 waiting for
monitor entry [0xcd2ae000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.jackrabbit.core.CachingHierarchyManager.nodeModified(CachingHierarchyManager.java:322)
- waiting to lock <0xe04ed0e8> (a java.lang.Object)
at
org.apache.jackrabbit.core.state.StateChangeDispatcher.notifyNodeModified(StateChangeDispatcher.java:175)
at
org.apache.jackrabbit.core.state.SessionItemStateManager.nodeModified(SessionItemStateManager.java:902)
at
org.apache.jackrabbit.core.state.StateChangeDispatcher.notifyNodeModified(StateChangeDispatcher.java:175)
at
org.apache.jackrabbit.core.state.LocalItemStateManager.stateModified(LocalItemStateManager.java:515)
at
org.apache.jackrabbit.core.state.ForkedXAItemStateManager.stateModified(ForkedXAItemStateManager.java:597)
at
org.apache.jackrabbit.core.state.StateChangeDispatcher.notifyStateModified(StateChangeDispatcher.java:107)
at
org.apache.jackrabbit.core.state.SharedItemStateManager.stateModified(SharedItemStateManager.java:411)
at
org.apache.jackrabbit.core.state.ItemState.notifyStateUpdated(ItemState.java:242)
at
org.apache.jackrabbit.core.state.ChangeLog.persisted(ChangeLog.java:300)
at
org.apache.jackrabbit.core.state.SharedItemStateManager.doExternalUpdate(SharedItemStateManager.java:1596)
at
org.apache.jackrabbit.core.state.SharedItemStateManager.externalUpdate(SharedItemStateManager.java:1528)
In other words, the latter thread successfully obtained a write lock in
SharedItemStateManager#externalUpdate:1545 but is now waiting for the lock
obtained by 1 to be released.
I don't see an easy way out of this. We can't move the notification of item
state listeners outside the scope of the write lock can we? Changing
CachingHierarchyManager to not make calls that may need a read lock in SISM
also seems like a far-fetched idea.
Because the deadlock occurs while holding a write lock the whole repository
becomes unresponsive.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira