Carsten Ziegeler created SLING-13132:
----------------------------------------

             Summary: ABBA deadlock between 
JobConsumerManager.topicToConsumerMap and QueueJobCache.cache locks
                 Key: SLING-13132
                 URL: https://issues.apache.org/jira/browse/SLING-13132
             Project: Sling
          Issue Type: Bug
          Components: Event
            Reporter: Carsten Ziegeler


A concurrency review identified a lock-order inversion (ABBA deadlock) between 
two locks:

Thread A (OSGi unbind path): JobConsumerManager.unbindService() holds 
synchronized(topicToConsumerMap), then calls context.asyncProcessingFinished() 
which triggers the async handler chain: finishedJob() -> reschedule() -> 
requeue() -> cache.reschedule() -> synchronized(cache)

Thread B (job processing): JobQueueImpl.startJobs() -> cache.getNextJob() holds 
synchronized(cache), then calls jobConsumerManager.getExecutor() -> 
synchronized(topicToConsumerMap)

Lock ordering:
- Thread A: topicToConsumerMap -> cache
- Thread B: cache -> topicToConsumerMap

This is a classic ABBA deadlock. With retry delay <= 0, the requeue path is 
synchronous and the deadlock is reachable in production. This can cause a 
complete system freeze.

Proposed fix: In unbindService(), collect the async callback contexts into a 
local list while holding the topicToConsumerMap lock, then invoke 
asyncProcessingFinished() outside the lock. This breaks the topicToConsumerMap 
-> cache path while preserving the cache -> topicToConsumerMap path unchanged.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to