[ 
https://issues.apache.org/jira/browse/JCR-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davide Maestroni updated JCR-3597:
----------------------------------

    Attachment: thread_dump.txt

Thread dump logs file
                
> Dead lock when using LockManager session scoped locks on nodes
> --------------------------------------------------------------
>
>                 Key: JCR-3597
>                 URL: https://issues.apache.org/jira/browse/JCR-3597
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core
>    Affects Versions: 2.6.1
>         Environment: Sling + Jackrabbit
>            Reporter: Davide Maestroni
>            Priority: Blocker
>         Attachments: thread_dump.txt
>
>
> I have a repository created using Jackrabbit and accessed through HTTP APIs 
> via Sling. I use the LockManager class to create locks on nodes before adding 
> new children and updating properties. When testing the speed of the system I 
> ran across an issue when using different threads to add child nodes to 
> different parent nodes: a dead lock happens quite consistently when one 
> thread is logging out from a JCR session and another one is saving its own 
> session (I'll attach the thread dump to the JIRA issue).
> So I started inspecting the source code, trying to understand what the 
> problem was, and I think I located the problem. Basically it all happens when 
> calling the SharedItemStateManager.beginUpdate() method: one thread, inside 
> the Update.begin(), acquires the synchronized lock on the 
> LocalItemStateManager instance and wait for the ISMLocking lock to be 
> released (Thread t@295 in the logs); while the other, which owns the 
> ISMLocking lock, inside the Update.end() triggers an event which in turn try 
> to synchronize with the LocalItemStateManager instance (Thread t@293 in the 
> logs).
> I noticed that the LocalItemStateManager implementation employs synchronized 
> blocks only in two instances: one in the edit() method to synchronize the 
> whole function, and one in the getItemState() method to synchronize the 
> access to the cache object. Both are quite odd since: the edit() is the only 
> synchronized method and there are several other places in which editMode and 
> changeLog are modified, so it is quite difficult to understand the use of it; 
> the cache block to be atomic can simply employ a dedicated object to 
> synchronize on instead of locking the whole instance.
> Another suspicious piece of code is inside the LockManagerImpl class, where 
> there are two blocks of code synchronized on the LocalItemStateManager 
> instance. Again, since no method of the class, with the exception of the 
> edit(), is synchronized, nothing prevents another thread to call the 
> LocalItemStateManager instance at the same time.
> I finally identified the root cause of my issue in the "synchronized (this)" 
> (line 170) inside the LocalItemStateManager.getItemState() method and, since 
> the comment states that the lock is there just to ensure the calls to the 
> cache to be atomically executed, I tried to add a simple object field name 
> "monitor" to the class and use "synchronized (monitor)" instead. The patch 
> looks to solve the dead lock, still I am confused about the pieces of code 
> mentioned above.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to