apoorvmittal10 commented on PR #19759:
URL: https://github.com/apache/kafka/pull/19759#issuecomment-2899443174

   > @apoorvmittal10 : Thanks for the explanation. It seems that we found a 
case that an acquired sharePartition lock was released more than once. I am 
still not sure how it happened. forceComplete() is supposed to only return true 
once and the lock is only released when forceComplete() returns true. How is 
the acquired lock released a second time?
   
   Prior to this PR there is no lock in 
[forceComplete](https://github.com/apache/kafka/blob/e88c10d5952c464a0f597b295deaa51e4c9fa978/server-common/src/main/java/org/apache/kafka/server/purgatory/DelayedOperation.java#L73)
 but a way to make sure that onComplete is not executed twice. However, 
consider that thread 1 has started executing 
[safeTryComplete](https://github.com/apache/kafka/blob/e88c10d5952c464a0f597b295deaa51e4c9fa978/server-common/src/main/java/org/apache/kafka/server/purgatory/DelayedOperation.java#L133)
  which attains lock and checks that if the operation has marked completed, if 
not then tryComplete is executed. Now thread 1 is executing tryComplete, then 
thread 2 on timeout executes 
[forceComplete](https://github.com/apache/kafka/blob/e88c10d5952c464a0f597b295deaa51e4c9fa978/server-common/src/main/java/org/apache/kafka/server/purgatory/DelayedOperation.java#L148)
 which marks completed as true and waits 
[here](https://github.com/apache/kafka/blob/e88c10d5952c464a0f59
 
7b295deaa51e4c9fa978/core/src/main/java/kafka/server/share/DelayedShareFetch.java#L199)
 for the operation lock. Now thread 1, in tryComplete, acquires some share 
partitions but as 
[completed](https://github.com/apache/kafka/blob/e88c10d5952c464a0f597b295deaa51e4c9fa978/core/src/main/java/kafka/server/share/DelayedShareFetch.java#L899)
 is already marked true then the share partitions are released and so the lock 
for delayed operation. Thread 2 now starts executing `onComplete` but as 
[partitionsAcquired](https://github.com/apache/kafka/blob/e88c10d5952c464a0f597b295deaa51e4c9fa978/core/src/main/java/kafka/server/share/DelayedShareFetch.java#L106)
 is at instance level hence `onComplete` tries to execute further with those 
share partitions which are already released. Once completed then onCompete 
releases those share partitions again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to