Davide Maestroni created JCR-3597:
-------------------------------------

             Summary: Dead lock when using LockManager session scoped locks on 
nodes
                 Key: JCR-3597
                 URL: https://issues.apache.org/jira/browse/JCR-3597
             Project: Jackrabbit Content Repository
          Issue Type: Bug
          Components: jackrabbit-core
    Affects Versions: 2.6.1
         Environment: Sling + Jackrabbit
            Reporter: Davide Maestroni
            Priority: Blocker
         Attachments: thread_dump.txt

I have a repository created using Jackrabbit and accessed through HTTP APIs via 
Sling. I use the LockManager class to create locks on nodes before adding new 
children and updating properties. When testing the speed of the system I ran 
across an issue when using different threads to add child nodes to different 
parent nodes: a dead lock happens quite consistently when one thread is logging 
out from a JCR session and another one is saving its own session (I'll attach 
the thread dump to the JIRA issue).
So I started inspecting the source code, trying to understand what the problem 
was, and I think I located the problem. Basically it all happens when calling 
the SharedItemStateManager.beginUpdate() method: one thread, inside the 
Update.begin(), acquires the synchronized lock on the LocalItemStateManager 
instance and wait for the ISMLocking lock to be released (Thread t@295 in the 
logs); while the other, which owns the ISMLocking lock, inside the Update.end() 
triggers an event which in turn try to synchronize with the 
LocalItemStateManager instance (Thread t@293 in the logs).
I noticed that the LocalItemStateManager implementation employs synchronized 
blocks only in two instances: one in the edit() method to synchronize the whole 
function, and one in the getItemState() method to synchronize the access to the 
cache object. Both are quite odd since: the edit() is the only synchronized 
method and there are several other places in which editMode and changeLog are 
modified, so it is quite difficult to understand the use of it; the cache block 
to be atomic can simply employ a dedicated object to synchronize on instead of 
locking the whole instance.
Another suspicious piece of code is inside the LockManagerImpl class, where 
there are two blocks of code synchronized on the LocalItemStateManager 
instance. Again, since no method of the class, with the exception of the 
edit(), is synchronized, nothing prevents another thread to call the 
LocalItemStateManager instance at the same time.
I finally identified the root cause of my issue in the "synchronized (this)" 
(line 170) inside the LocalItemStateManager.getItemState() method and, since 
the comment states that the lock is there just to ensure the calls to the cache 
to be atomically executed, I tried to add a simple object field name "monitor" 
to the class and use "synchronized (monitor)" instead. The patch looks to solve 
the dead lock, still I am confused about the pieces of code mentioned above.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to