slominskir opened a new issue #6248: Shutdown Standalone Gracefully URL: https://github.com/apache/pulsar/issues/6248 A standalone Pulsar daemon server often throws various Exceptions when a shutdown is requested. This does not inspire confidence in the product. Can I trust this product in production? If the developers couldn't get a clean shutdown on an idle server, what are the chances this beast works reliably with tons of messages going in and out? And if the developers couldn't figure it out in the base case of a single server and no messages - how am I going to troubleshoot this huge amount of code that takes an hour to compile and apparently requires six servers to test under load? Example Exception 1: ``` 14:17:48.392 [Curator-LeaderSelector-0] WARN org.apache.bookkeeper.stream.storage.impl.cluster.ClusterControllerLeaderImpl - Controller leader is interrupted, giving up leadership 14:17:48.396 [Curator-LeaderSelector-0] ERROR org.apache.curator.framework.recipes.leader.LeaderSelector - The leader threw an exception java.lang.InterruptedException: null at java.lang.Object.wait(Native Method) ~[?:1.8.0_232] at java.lang.Object.wait(Object.java:502) ~[?:1.8.0_232] at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1529) ~[org.apache.pulsar-pulsar-zookeeper-2.5.0.jar:2.5.0] at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1512) ~[org.apache.pulsar-pulsar-zookeeper-2.5.0.jar:2.5.0] at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:1791) ~[org.apache.pulsar-pulsar-zookeeper-2.5.0.jar:2.5.0] at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:274) ~[org.apache.curator-curator-framework-4.0.1.jar:4.0.1] at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:268) ~[org.apache.curator-curator-framework-4.0.1.jar:4.0.1] at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64) ~[org.apache.curator-curator-client-4.0.1.jar:?] at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100) ~[org.apache.curator-curator-client-4.0.1.jar:?] at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:265) ~[org.apache.curator-curator-framework-4.0.1.jar:4.0.1] at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:249) ~[org.apache.curator-curator-framework-4.0.1.jar:4.0.1] at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:34) ~[org.apache.curator-curator-framework-4.0.1.jar:4.0.1] at org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:347) ~[org.apache.curator-curator-recipes-4.0.1.jar:4.0.1] at org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:124) ~[org.apache.curator-curator-recipes-4.0.1.jar:4.0.1] at org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154) ~[org.apache.curator-curator-recipes-4.0.1.jar:4.0.1] at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449) [org.apache.curator-curator-recipes-4.0.1.jar:4.0.1] at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466) [org.apache.curator-curator-recipes-4.0.1.jar:4.0.1] at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65) [org.apache.curator-curator-recipes-4.0.1.jar:4.0.1] at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246) [org.apache.curator-curator-recipes-4.0.1.jar:4.0.1] at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240) [org.apache.curator-curator-recipes-4.0.1.jar:4.0.1] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_232] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_232] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_232] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232] ``` Example Exception 2: ``` 14:17:48.511 [Thread-1] ERROR org.apache.bookkeeper.client.MetadataUpdateLoop - UpdateLoop(ledgerId=2243,loopId=6839c228) Error writing metadata to store org.apache.bookkeeper.client.BKException$BKClientClosedException: BookKeeper client is closed at org.apache.bookkeeper.meta.CleanupLedgerManager.close(CleanupLedgerManager.java:245) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0] at org.apache.bookkeeper.client.BookKeeper.close(BookKeeper.java:1410) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0] at org.apache.distributedlog.BookKeeperClient.close(BookKeeperClient.java:271) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.distributedlog.impl.BKNamespaceDriver.doClose(BKNamespaceDriver.java:404) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.distributedlog.impl.BKNamespaceDriver.close(BKNamespaceDriver.java:385) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at com.google.common.io.Closeables.close(Closeables.java:78) ~[com.google.guava-guava-25.1-jre.jar:?] at org.apache.distributedlog.util.Utils.close(Utils.java:544) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.distributedlog.BKDistributedLogNamespace.close(BKDistributedLogNamespace.java:341) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.bookkeeper.stream.server.service.DLNamespaceProviderService.doClose(DLNamespaceProviderService.java:135) ~[org.apache.bookkeeper-stream-storage-server-4.10.0.jar:4.10.0] at org.apache.bookkeeper.common.component.AbstractLifecycleComponent.close(AbstractLifecycleComponent.java:123) ~[org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at org.apache.bookkeeper.common.component.LifecycleComponentStack.lambda$close$4(LifecycleComponentStack.java:123) ~[org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:407) [com.google.guava-guava-25.1-jre.jar:?] at org.apache.bookkeeper.common.component.LifecycleComponentStack.close(LifecycleComponentStack.java:123) [org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at org.apache.bookkeeper.stream.server.StreamStorageLifecycleComponent.doClose(StreamStorageLifecycleComponent.java:61) [org.apache.bookkeeper-stream-storage-server-4.10.0.jar:4.10.0] at org.apache.bookkeeper.common.component.AbstractLifecycleComponent.close(AbstractLifecycleComponent.java:123) [org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.stop(LocalBookkeeperEnsemble.java:443) [org.apache.pulsar-pulsar-zookeeper-utils-2.5.0.jar:2.5.0] at org.apache.pulsar.PulsarStandaloneStarter$1.run(PulsarStandaloneStarter.java:102) [org.apache.pulsar-pulsar-broker-2.5.0.jar:2.5.0] 14:17:48.512 [DLM-/stream/storage-OrderedScheduler-1-0-EventThread] INFO org.apache.distributedlog.BKLogWriteHandler - Try storing max sequence number 4 in completing /stream/storage/streams_000000000000000001_000000000000000001_000000000000000000/<default>/ledgers/inprogress_000000000000000004. ``` Example Exception 3: ``` 14:17:48.512 [Thread-1] ERROR org.apache.distributedlog.BKAbstractLogWriter - Completing Log segments encountered exception java.io.IOException: Failed to close ledger for streams_000000000000000000_000000000000000001_000000000000000000:<default>:inprogress_000000000000000004 : BookKeeper client is closed at org.apache.distributedlog.BKLogSegmentWriter$6.closeComplete(BKLogSegmentWriter.java:660) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.bookkeeper.client.LedgerHandle$5.lambda$safeRun$0(LedgerHandle.java:552) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_232] at org.apache.bookkeeper.client.LedgerHandle$5.lambda$safeRun$3(LedgerHandle.java:614) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_232] at org.apache.bookkeeper.client.MetadataUpdateLoop.lambda$writeLoop$1(MetadataUpdateLoop.java:146) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_232] at org.apache.bookkeeper.meta.CleanupLedgerManager.lambda$close$1(CleanupLedgerManager.java:246) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0] at java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649) ~[?:1.8.0_232] at org.apache.bookkeeper.meta.CleanupLedgerManager.close(CleanupLedgerManager.java:246) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0] at org.apache.bookkeeper.client.BookKeeper.close(BookKeeper.java:1410) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0] at org.apache.distributedlog.BookKeeperClient.close(BookKeeperClient.java:271) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.distributedlog.impl.BKNamespaceDriver.doClose(BKNamespaceDriver.java:404) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.distributedlog.impl.BKNamespaceDriver.close(BKNamespaceDriver.java:385) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at com.google.common.io.Closeables.close(Closeables.java:78) ~[com.google.guava-guava-25.1-jre.jar:?] at org.apache.distributedlog.util.Utils.close(Utils.java:544) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.distributedlog.BKDistributedLogNamespace.close(BKDistributedLogNamespace.java:341) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.bookkeeper.stream.server.service.DLNamespaceProviderService.doClose(DLNamespaceProviderService.java:135) ~[org.apache.bookkeeper-stream-storage-server-4.10.0.jar:4.10.0] at org.apache.bookkeeper.common.component.AbstractLifecycleComponent.close(AbstractLifecycleComponent.java:123) ~[org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at org.apache.bookkeeper.common.component.LifecycleComponentStack.lambda$close$4(LifecycleComponentStack.java:123) ~[org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:407) [com.google.guava-guava-25.1-jre.jar:?] at org.apache.bookkeeper.common.component.LifecycleComponentStack.close(LifecycleComponentStack.java:123) [org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at org.apache.bookkeeper.stream.server.StreamStorageLifecycleComponent.doClose(StreamStorageLifecycleComponent.java:61) [org.apache.bookkeeper-stream-storage-server-4.10.0.jar:4.10.0] at org.apache.bookkeeper.common.component.AbstractLifecycleComponent.close(AbstractLifecycleComponent.java:123) [org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.stop(LocalBookkeeperEnsemble.java:443) [org.apache.pulsar-pulsar-zookeeper-utils-2.5.0.jar:2.5.0] at org.apache.pulsar.PulsarStandaloneStarter$1.run(PulsarStandaloneStarter.java:102) [org.apache.pulsar-pulsar-broker-2.5.0.jar:2.5.0] ``` **To Reproduce** 1. Download latest version of Pulsar (2.5.0 at this time) 2. Execute "bin/pulsar-daemon start standalone" 3. Monitor the log file to verify all is well so far 4. Execute "bin/pulsar-daemon stop standalone" 5. Observe disturbing number of Exceptions in log file **Expected behavior** I expect the software to shutdown cleanly (without Exceptions and stack traces). - OS: Red Hat Enterprise Linux 7 **Additional context** Server is idle (no clients).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
