bharanic-dev opened a new pull request #14634:
URL: https://github.com/apache/pulsar/pull/14634


   
   Fixes #14633 
   
   *(or if this PR is one task of a github issue, please add `Master Issue: 
#<xyz>` to link to the master issue.)*
   
   ### Motivation
   
   
https://github.com/apache/pulsar/blob/2b3e8aeb5a1c259e0325e5a91dc5d7e20c6ee569/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1757
   
   makes a blocking call to metadatastore operation (stack trace below) while 
holding a lock (due to the synchronized keyword). But the callback that 
completes the future can't execute (metadata-store executor is a single 
threaded executor) because the callback is blocked waiting for the lock held by 
this thread.
   
   "pulsar-backlog-quota-checker-30-1" #88 prio=5 os_prio=0 cpu=662.90ms 
elapsed=81026.81s tid=0x00007f17031ce000 nid=0xa4 waiting on condition  
[0x00007f15a631e000]
      java.lang.Thread.State: TIMED_WAITING (parking)
           at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
           - parking to wait for  <0x00000007c1f55408> (a 
java.util.concurrent.CompletableFuture$Signaller)
           at 
java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:234)
           at 
java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1798)
           at 
java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3128)
           at 
java.util.concurrent.CompletableFuture.timedGet([email protected]/CompletableFuture.java:1868)
           at 
java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:2021)
           at 
org.apache.pulsar.broker.resources.BaseResources.get(BaseResources.java:86)
           at 
org.apache.pulsar.broker.resources.NamespaceResources.getPolicies(NamespaceResources.java:105)
           at 
org.apache.pulsar.broker.service.BacklogQuotaManager.getBacklogQuota(BacklogQuotaManager.java:70)
           at 
org.apache.pulsar.broker.service.BacklogQuotaManager.getBacklogQuota(BacklogQuotaManager.java:82)
           at 
org.apache.pulsar.broker.service.BacklogQuotaManager.getBacklogQuotaLimitInSize(BacklogQuotaManager.java:101)
           at 
org.apache.pulsar.broker.service.persistent.PersistentTopic.isSizeBacklogExceeded(PersistentTopic.java:2502)
           at 
org.apache.pulsar.broker.service.BrokerService.lambda$monitorBacklogQuota$69(BrokerService.java:1611)
           at 
org.apache.pulsar.broker.service.BrokerService$$Lambda$713/0x00000008406a9840.accept(Unknown
 Source)
           at 
org.apache.pulsar.broker.service.BrokerService$$Lambda$709/0x00000008406a8840.accept(Unknown
 Source)
           at java.util.Optional.ifPresent([email protected]/Optional.java:183)
           at 
org.apache.pulsar.broker.service.BrokerService.lambda$forEachTopic$68(BrokerService.java:1599)
           at 
org.apache.pulsar.broker.service.BrokerService$$Lambda$708/0x00000008406a8440.accept(Unknown
 Source)
           at 
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:387)
           at 
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:159)
           at 
org.apache.pulsar.broker.service.BrokerService.forEachTopic(BrokerService.java:1597)
           at 
org.apache.pulsar.broker.service.BrokerService.monitorBacklogQuota(BrokerService.java:1608)
           - locked <0x00000003018b1c30> (a 
org.apache.pulsar.broker.service.BrokerService)
           at 
org.apache.pulsar.broker.service.BrokerService$$Lambda$320/0x00000008403f4840.run(Unknown
 Source)
           at 
org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32)
           at 
org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
           at 
java.util.concurrent.Executors$RunnableAdapter.call([email protected]/Executors.java:515)
           at 
java.util.concurrent.FutureTask.runAndReset([email protected]/FutureTask.java:305)
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run([email protected]/ScheduledThreadPoolExecutor.java:305)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
           at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
           at java.lang.Thread.run([email protected]/Thread.java:829)
   
   
https://github.com/apache/pulsar/blob/0a91196dcc4d31ae647867ed319b8c1af0cb93c6/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java#L78
   
   ### Modifications
   
   The synchronized keyword was added as part of 
https://github.com/apache/pulsar/pull/14367. This causes the deadlock. The 
synchronized is not really required as the topic datastructure is a 
concurrentHashMap.
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   The fix was also verified in production. After deploying the broker 
deadlocks and restarts went away.
   
   ### Documentation
   - [x] `no-need-doc` 
   
   Internal fix. Not user visible.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to