codelipenghui opened a new pull request #13744:
URL: https://github.com/apache/pulsar/pull/13744


   ### Motivation
   
   Fixes: #13736, the deadlock when using ZK thread to create ledger.
   
   The details of the deadlock:
   
   ```
   Found one Java-level deadlock:
   =============================
   "ZKC-connect-executor-0-SendThread(9.142.172.233:2181)":
     waiting to lock monitor 0x00007fa3fc033bd8 (object 0x00007fad5e804158, a 
org.apache.zookeeper.ZooKeeper$States),
     which is held by "PullMessageThread_167"
   "PullMessageThread_167":
     waiting to lock monitor 0x00007fbe40026668 (object 0x00007fb596023c08, a 
java.util.concurrent.LinkedBlockingQueue),
     which is held by "ZKC-connect-executor-0-SendThread(9.142.172.233:2181)"
   
   Java stack information for the threads listed above:
   ===================================================
   "ZKC-connect-executor-0-SendThread(9.142.172.233:2181)":
        at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1678)
        - waiting to lock <0x00007fad5e804158> (a 
org.apache.zookeeper.ZooKeeper$States)
        at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1649)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1905)
        at 
org.apache.bookkeeper.zookeeper.ZooKeeperClient$10.zkRun(ZooKeeperClient.java:752)
        at 
org.apache.bookkeeper.zookeeper.ZooKeeperClient$ZkRetryRunnable.run(ZooKeeperClient.java:392)
        at 
org.apache.bookkeeper.zookeeper.ZooKeeperClient.create(ZooKeeperClient.java:762)
        at 
org.apache.bookkeeper.util.ZkUtils.asyncCreateFullPathOptimistic(ZkUtils.java:75)
        at 
org.apache.bookkeeper.meta.ZkLedgerIdGenerator.generateLedgerIdImpl(ZkLedgerIdGenerator.java:78)
        at 
org.apache.bookkeeper.meta.ZkLedgerIdGenerator.generateLedgerId(ZkLedgerIdGenerator.java:73)
        at 
org.apache.bookkeeper.meta.LongZkLedgerIdGenerator.generateLedgerId(LongZkLedgerIdGenerator.java:301)
        at 
org.apache.bookkeeper.client.LedgerCreateOp.generateLedgerIdAndCreateLedger(LedgerCreateOp.java:194)
        at 
org.apache.bookkeeper.client.LedgerCreateOp.initiate(LedgerCreateOp.java:182)
        at 
org.apache.bookkeeper.client.BookKeeper.asyncCreateLedger(BookKeeper.java:860)
        at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.asyncCreateLedger(ManagedLedgerImpl.java:3645)
        at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.createLedgerAfterClosed(ManagedLedgerImpl.java:1596)
        - locked <0x00007fadbdfa7b90> (a 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl)
        at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.ledgerClosed(ManagedLedgerImpl.java:1587)
        - locked <0x00007fadbdfa7b90> (a 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl)
        at 
org.apache.bookkeeper.mledger.impl.OpAddEntry.closeComplete(OpAddEntry.java:236)
        at 
org.apache.bookkeeper.client.LedgerHandle$5.lambda$safeRun$0(LedgerHandle.java:552)
        at 
org.apache.bookkeeper.client.LedgerHandle$5$$Lambda$1557/731657311.accept(Unknown
 Source)
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at 
org.apache.bookkeeper.client.LedgerHandle$5.lambda$safeRun$3(LedgerHandle.java:614)
        at 
org.apache.bookkeeper.client.LedgerHandle$5$$Lambda$1563/295431727.accept(Unknown
 Source)
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at 
org.apache.bookkeeper.client.MetadataUpdateLoop.lambda$writeLoop$1(MetadataUpdateLoop.java:161)
        at 
org.apache.bookkeeper.client.MetadataUpdateLoop$$Lambda$1562/765900306.accept(Unknown
 Source)
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at 
org.apache.bookkeeper.meta.AbstractZkLedgerManager$4.processResult(AbstractZkLedgerManager.java:508)
        at 
org.apache.bookkeeper.zookeeper.ZooKeeperClient$22$1.processResult(ZooKeeperClient.java:1094)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:638)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:541)
        - locked <0x00007fb596023c08> (a 
java.util.concurrent.LinkedBlockingQueue)
        at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:781)
        at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:818)
        at org.apache.zookeeper.ClientCnxn.access$2600(ClientCnxn.java:106)
        at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1403)
        at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanAndNotifyState(ClientCnxn.java:1331)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1309)
   "PullMessageThread_167":
        at 
org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:537)
        - waiting to lock <0x00007fb596023c08> (a 
java.util.concurrent.LinkedBlockingQueue)
        at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:781)
        at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:818)
        at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1680)
        - locked <0x00007fad5e804158> (a org.apache.zookeeper.ZooKeeper$States)
        at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1649)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2411)
        at 
org.apache.bookkeeper.zookeeper.ZooKeeperClient$19.zkRun(ZooKeeperClient.java:1009)
        at 
org.apache.bookkeeper.zookeeper.ZooKeeperClient$ZkRetryRunnable.run(ZooKeeperClient.java:392)
        at 
org.apache.bookkeeper.zookeeper.ZooKeeperClient.getData(ZooKeeperClient.java:1019)
        at 
org.apache.bookkeeper.meta.AbstractZkLedgerManager.readLedgerMetadata(AbstractZkLedgerManager.java:435)
        at 
org.apache.bookkeeper.meta.AbstractZkLedgerManager.readLedgerMetadata(AbstractZkLedgerManager.java:430)
        at 
org.apache.bookkeeper.meta.CleanupLedgerManager.readLedgerMetadata(CleanupLedgerManager.java:157)
        at 
org.apache.bookkeeper.client.LedgerOpenOp.initiate(LedgerOpenOp.java:114)
        at 
org.apache.bookkeeper.client.LedgerOpenOp$OpenBuilderImpl.open(LedgerOpenOp.java:269)
        at 
org.apache.bookkeeper.client.LedgerOpenOp$OpenBuilderImpl.execute(LedgerOpenOp.java:247)
        at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.lambda$getLedgerHandle$20(ManagedLedgerImpl.java:1773)
        at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl$$Lambda$1195/876838597.apply(Unknown
 Source)
        at 
org.apache.pulsar.common.util.collections.ConcurrentLongHashMap$Section.put(ConcurrentLongHashMap.java:287)
        at 
org.apache.pulsar.common.util.collections.ConcurrentLongHashMap.computeIfAbsent(ConcurrentLongHashMap.java:135)
        at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.getLedgerHandle(ManagedLedgerImpl.java:1742)
        at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.asyncReadEntry(ManagedLedgerImpl.java:1839)
        at 
org.apache.bookkeeper.mledger.impl.OpFindNewest.find(OpFindNewest.java:147)
        at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.asyncFindPosition(ManagedLedgerImpl.java:1670)
        at 
org.streamnative.pulsar.handlers.rocketmq.utils.MessageIdUtils.getPositionForOffset(MessageIdUtils.java:183)
        at 
org.streamnative.pulsar.handlers.rocketmq.inner.RopServerCnx.getMessage(RopServerCnx.java:692)
        at 
org.streamnative.pulsar.handlers.rocketmq.inner.processor.PullMessageProcessor.processRequest(PullMessageProcessor.java:359)
        at 
org.streamnative.pulsar.handlers.rocketmq.inner.processor.PullMessageProcessor.processRequest(PullMessageProcessor.java:170)
        at 
org.streamnative.pulsar.handlers.rocketmq.inner.proxy.RopBrokerProxy$PullMessageProcessorProxy.processRequest(RopBrokerProxy.java:1098)
        at 
org.streamnative.pulsar.handlers.rocketmq.inner.NettyRemotingAbstract$1.run(NettyRemotingAbstract.java:202)
        at 
org.apache.rocketmq.remoting.netty.RequestTask.run(RequestTask.java:80)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   
   Found 1 deadlock.
   ```
   
   ### Modification
   
   Use the executor of the managed ledger to create the ledger to avoid the 
deadlock.
   
   <!--
   ### Contribution Checklist
     
     - Name the pull request in the form "[Issue XYZ][component] Title of the 
pull request", where *XYZ* should be replaced by the actual issue number.
       Skip *Issue XYZ* if there is no associated github issue for this pull 
request.
       Skip *component* if you are unsure about which is the best component. 
E.g. `[docs] Fix typo in produce method`.
   
     - Fill out the template below to describe the changes contributed by the 
pull request. That will give reviewers the context they need to do the review.
     
     - Each pull request should address only one issue, not mix up code from 
multiple issues.
     
     - Each commit in the pull request has a meaningful commit message
   
     - Once all items of the checklist are addressed, remove the above text and 
this checklist, leaving only the filled out template below.
   
   **(The sections below can be removed for hotfixes of typos)**
   -->
   
   *(If this PR fixes a github issue, please add `Fixes #<xyz>`.)*
   
   Fixes #<xyz>
   
   *(or if this PR is one task of a github issue, please add `Master Issue: 
#<xyz>` to link to the master issue.)*
   
   Master Issue: #<xyz>
   
   ### Motivation
   
   
   *Explain here the context, and why you're making that change. What is the 
problem you're trying to solve.*
   
   ### Modifications
   
   *Describe the modifications you've done.*
   
   ### Documentation
   
   Check the box below or label this PR directly (if you have committer 
privilege).
   
   Need to update docs? 
   
   - [ ] `doc-required` 
     
     (If you need help on updating docs, create a doc issue)
     
   - [x] `no-need-doc` 
     
     (Please explain why)
     
   - [ ] `doc` 
     
     (If this PR contains doc changes)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to