Thanks, Claus, for responding.

Unfortunately, upgrading to 2.4.x is not an immediate option. But as I have 
backported several fixes to the 1.6.x baseline a solution seems to be in place 
already.

I'm mostly interested in fully understanding how these deadlock came into 
existence in the first place. Looking at the code, there is a fixed order of 
events, namely

1. Acquire write lock on Shared ISM during prepare.
2. Downgrading the write lock to a read lock during commit,
3. Broadcasting "update ended" events, e.g. to update the Lucene indexes.
3.1. Lucene acquires a read lock on Shared ISM.

Specifically, the order of steps 2. and 3. is fixed within the 
SISM.Update#end() method
However, from my analysis it seems sometimes (and this is really rare) step 3. 
is being executed while 2. is not yet effective. And this bends my mind.
Current revisions of SISM have lines reordered a little, but to me there is no 
clear indication that a similar situation will not occur with trunk as well.

I guess it all boils down to the question: does all this go back to a bug 
within the JVM or is there still some "happens before" indicator missing from 
the code.

During merge verification we have manually switched 2. and 3. and could verify 
that the fixes (specifically JCR-2820) are effective.

Thanks and kind regards
Robert

Reply via email to