Davide Maestroni created JCR-3597:
-------------------------------------
Summary: Dead lock when using LockManager session scoped locks on
nodes
Key: JCR-3597
URL: https://issues.apache.org/jira/browse/JCR-3597
Project: Jackrabbit Content Repository
Issue Type: Bug
Components: jackrabbit-core
Affects Versions: 2.6.1
Environment: Sling + Jackrabbit
Reporter: Davide Maestroni
Priority: Blocker
Attachments: thread_dump.txt
I have a repository created using Jackrabbit and accessed through HTTP APIs via
Sling. I use the LockManager class to create locks on nodes before adding new
children and updating properties. When testing the speed of the system I ran
across an issue when using different threads to add child nodes to different
parent nodes: a dead lock happens quite consistently when one thread is logging
out from a JCR session and another one is saving its own session (I'll attach
the thread dump to the JIRA issue).
So I started inspecting the source code, trying to understand what the problem
was, and I think I located the problem. Basically it all happens when calling
the SharedItemStateManager.beginUpdate() method: one thread, inside the
Update.begin(), acquires the synchronized lock on the LocalItemStateManager
instance and wait for the ISMLocking lock to be released (Thread t@295 in the
logs); while the other, which owns the ISMLocking lock, inside the Update.end()
triggers an event which in turn try to synchronize with the
LocalItemStateManager instance (Thread t@293 in the logs).
I noticed that the LocalItemStateManager implementation employs synchronized
blocks only in two instances: one in the edit() method to synchronize the whole
function, and one in the getItemState() method to synchronize the access to the
cache object. Both are quite odd since: the edit() is the only synchronized
method and there are several other places in which editMode and changeLog are
modified, so it is quite difficult to understand the use of it; the cache block
to be atomic can simply employ a dedicated object to synchronize on instead of
locking the whole instance.
Another suspicious piece of code is inside the LockManagerImpl class, where
there are two blocks of code synchronized on the LocalItemStateManager
instance. Again, since no method of the class, with the exception of the
edit(), is synchronized, nothing prevents another thread to call the
LocalItemStateManager instance at the same time.
I finally identified the root cause of my issue in the "synchronized (this)"
(line 170) inside the LocalItemStateManager.getItemState() method and, since
the comment states that the lock is there just to ensure the calls to the cache
to be atomically executed, I tried to add a simple object field name "monitor"
to the class and use "synchronized (monitor)" instead. The patch looks to solve
the dead lock, still I am confused about the pieces of code mentioned above.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira