aloyszhang opened a new issue #7751:
URL: https://github.com/apache/pulsar/issues/7751


   **Describe the bug**
   Pulsar-Admin operations like `topics get-partitioned-topic-metadata` or  
`topics list` does not  work, all returns timeout.
   
   **To Reproduce**
   hard to reproduce
   
   **Additional context**
   For trouble shooting, found that most of the pulsar-web thread (44/48) are 
blocked, all blocked thread information are the same as follow:
   ```
   "pulsar-web-32-42" #4770 prio=5 os_prio=0 tid=0x00007f4954bd9800 nid=0x10d9 
waiting on condition [0x00007f48ead21000]
      java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006cf643200> (a 
java.util.concurrent.CompletableFuture$Signaller)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
        at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
        at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
        at 
java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1934)
        at 
org.apache.pulsar.zookeeper.ZooKeeperChildrenCache.get(ZooKeeperChildrenCache.java:62)
        at 
org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.internalGetList(PersistentTopicsBase.java:141)
        at 
org.apache.pulsar.broker.admin.v2.PersistentTopics.getList(PersistentTopics.java:84)
        at sun.reflect.GeneratedMethodAccessor104.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
        at 
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$$Lambda$250/1014555985.invoke(Unknown
 Source)
        at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
        at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
        at 
org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:183)
        at 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103)
        at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493)
        at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:415)
        at 
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:104)
        at 
org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:277)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:272)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:268)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:316)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:298)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:268)
        at 
org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:289)
        at 
org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:256)
        at 
org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:703)
        at 
org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:416)
        at 
org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370)
        at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389)
        at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342)
        at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229)
        at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)
        at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)
        at 
org.apache.pulsar.broker.web.ResponseHandlerFilter.doFilter(ResponseHandlerFilter.java:53)
        at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
        at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:542)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
        at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
        at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)
        at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1204)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
        at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
        at 
org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:173)
        at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.Server.handle(Server.java:494)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)
        at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)
        at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
        at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)
   ```
   As we can see, thread bloked on 
   ```java
   ZooKeeperChildrenCache#get()
   public Set<String> get(String path) throws KeeperException, 
InterruptedException {
           if (LOG.isDebugEnabled()) {
               LOG.debug("getChildren called at: {}", path);
           }
   
           Set<String> children = cache.getChildrenAsync(path, this).join();
           if (children == null) {
               throw KeeperException.create(KeeperException.Code.NONODE);
           }
   
           return children;
       }
   ...
   ZooKeeperCache#getChildrenAsync()
    public CompletableFuture<Set<String>> getChildrenAsync(String path, Watcher 
watcher) {
           return childrenCache.get(path, (p, executor) -> {
               CompletableFuture<Set<String>> future = new 
CompletableFuture<>();
               executor.execute(SafeRunnable.safeRun(() -> {
                   ZooKeeper zk = zkSession.get();
                   if (zk == null) {
                       future.completeExceptionally(new IOException("ZK session 
not ready"));
                       return;
                   }
   
                   zk.getChildren(path, watcher, (rc, path1, ctx, children) -> {
                       if (rc == Code.OK.intValue()) {
                           future.complete(Sets.newTreeSet(children));
                       } else if (rc == Code.NONODE.intValue()) {
                           // The node we want may not exist yet, so put a 
watcher on its existence
                           // before throwing up the exception. Its possible 
that the node could have
                           // been created after the call to getChildren, but 
before the call to exists().
                           // If this is the case, exists will return true, and 
we just call getChildren again.
                           existsAsync(path, watcher).thenAccept(exists -> {
                               if (exists) {
                                   getChildrenAsync(path, watcher)
                                           .thenAccept(c -> future.complete(c))
                                           .exceptionally(ex -> {
                                               future.completeExceptionally(ex);
                                               return null;
                                           });
                               } else {
                                   // Z-node does not exist
                                   future.complete(Collections.emptySet());
                               }
                           }).exceptionally(ex -> {
                               future.completeExceptionally(ex);
                               return null;
                           });
                       } else {
                           
future.completeExceptionally(KeeperException.create(rc));
                       }
                   }, null);
               }));
   
               return future;
           });
       }
   ```
   At first, we thougnt this may cause by ForJoinPool has no idle thread to 
use, but jstack shows that ForkJoinPool are not busy, 
   ```
   "ForkJoinPool.commonPool-worker-1" #20451 daemon prio=5 os_prio=0 
tid=0x00007f4957b70000 nid=0x6def waiting on condition [0x00007f48e1d8c000]
      java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006c0466200> (a 
java.util.concurrent.ForkJoinPool)
        at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1824)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1693)
        at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
   
   "ForkJoinPool.commonPool-worker-8" #20450 daemon prio=5 os_prio=0 
tid=0x00007f495796d800 nid=0x39a3 waiting on condition [0x00007f48e5f97000]
      java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006c0466200> (a 
java.util.concurrent.ForkJoinPool)
        at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1824)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1693)
        at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
   
   "ForkJoinPool.commonPool-worker-15" #20449 daemon prio=5 os_prio=0 
tid=0x00007f4956ca9800 nid=0x2b9c waiting on condition [0x00007f48e1b8a000]
      java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006c0466200> (a 
java.util.concurrent.ForkJoinPool)
        at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1824)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1693)
        at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
   
   "ForkJoinPool.commonPool-worker-22" #20448 daemon prio=5 os_prio=0 
tid=0x00007f4957938800 nid=0x152d waiting on condition [0x00007f48e1c8b000]
      java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006c0466200> (a 
java.util.concurrent.ForkJoinPool)
        at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1824)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1693)
        at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
   
   ```
   we are stucked here, does anybody has any suggestion on this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to