BewareMyPower commented on PR #16969:
URL: https://github.com/apache/pulsar/pull/16969#issuecomment-1244938307

   This PR accidentally fixed a bug that could be reproduced in branch-2.10 so 
I added other release labels and will cherry-pick it soon. It's better to 
include this PR in 2.11.0 release.
   
   ----
   
   While I debugged the stuck cpp tests in branch-2.10, I found a deadlock in:
   
   ```
   (lldb) bt
   * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
     * frame #0: 0x00007ff806919bd2 libsystem_kernel.dylib`__psynch_mutexwait + 
10
       frame #1: 0x00007ff806951e7e 
libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 76
       frame #2: 0x00007ff80694fcbb 
libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 205
       frame #3: 0x00007ff8068b4719 libc++.1.dylib`std::__1::mutex::lock() + 9
       frame #4: 0x000000010452d677 
libpulsar.2.10.1.dylib`std::__1::unique_lock<std::__1::mutex>::unique_lock(this=0x00007ff7bfefddb8,
 __m=0x0000000103d04230) at __mutex_base:118:61
       frame #5: 0x000000010451fa1d 
libpulsar.2.10.1.dylib`std::__1::unique_lock<std::__1::mutex>::unique_lock(this=0x00007ff7bfefddb8,
 __m=0x0000000103d04230) at __mutex_base:118:54
       frame #6: 0x0000000104520ad4 
libpulsar.2.10.1.dylib`pulsar::PartitionedConsumerImpl::getNumPartitionsWithLock(this=0x0000000103d041b0)
 const at PartitionedConsumerImpl.cc:196:10
       frame #7: 0x0000000104522dec 
libpulsar.2.10.1.dylib`pulsar::PartitionedConsumerImpl::handleSinglePartitionConsumerClose(this=0x0000000103d041b0,
 result=ResultOk, partitionIndex=0, callback=pulsar::CloseCallback @ 
0x00007ff7bfefdfc0)>) at PartitionedConsumerImpl.cc:309:5
       frame #8: 0x000000010453a92b 
libpulsar.2.10.1.dylib`pulsar::PartitionedConsumerImpl::closeAsync(this=0x0000600002630010,
 result=ResultOk)>)::$_0::operator()(pulsar::Result) const at 
PartitionedConsumerImpl.cc:342:17
   ```
   
   See 
https://github.com/apache/pulsar/blob/2a31b8b080699efd64f5edc2685ef06f40e4c4ca/pulsar-client-cpp/lib/PartitionedConsumerImpl.cc#L326-L342
   
   The `consumersMutex_` should be unlocked after checking `consumers_.empty()` 
like:
   
   ```c++
       Lock lock(consumersMutex_);
       if (consumers_.empty()) {
           notifyResult(callback);
           return;
       }
       lock.unlock();  // HERE
   ```
   
   Because the subsequent `handleSinglePartitionConsumerClose` might be called 
in the current thread if the single consumer's state is not ready. Since 
`handleSinglePartitionConsumerClose` calls `getNumPartitionsWithLock` that 
acquires the same lock, the deadlock will happen.
   
   There should be a bug fix for it but this refactor PR already removes the 
bug. So I choose to cherry-pick this PR to older branches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to