candlerb opened a new issue #5652: Server wedged after deleting and recreating functions URL: https://github.com/apache/pulsar/issues/5652 **Describe the bug** I managed to get pulsar (standalone 2.4.1) into a wedged state reporting bookkeeper errors after deleting and recreating the same function. **To Reproduce** ``` $ cat womble.yaml tenant: public namespace: default name: womble py: /home/ubuntu/func1.py className: func1.FirstFunction inputs: [my-topic] userConfig: wibble: bibble $ cat func1.py from pulsar import Function class FirstFunction(Function): def process(self, item, context): log = context.get_logger() log.info("(v%r) Got %r with properties %r" % (context.get_function_version(), item, context.get_message_properties())) $ apache-pulsar-2.4.1/bin/pulsar-admin functions create --function-config-file /home/ubuntu/womble.yaml ... $ apache-pulsar-2.4.1/bin/pulsar-admin functions delete --name womble ``` and repeat. **Expected behavior** Obviously, should be able to delete and recreate the function as many times as required. **Actual behavior** I got into a state where pulsar commands are hanging, and timing out after 60 seconds. tcpdump for pulsar-admin create is in [this gist](https://gist.github.com/candlerb/e0d926ce3488f762fff91c8f1ddd8a4f). Shows repeated `org.apache.bookkeeper.mledger.ManagedLedgerException: ManagedCursor not found: public%2Fdefault%2Fwomble` With pulsar-admin delete, tcpdump shows nothing useful - it just times out. However, in the foreground output of the pulsar server itself, I see: ``` 10:43:12.569 [cluster-service-coordinator-timer] ERROR org.apache.pulsar.functions.worker.MembershipManager - Failed to get status of coordinate topic persistent://public/functions/coordinate org.apache.pulsar.client.admin.PulsarAdminException$TimeoutException: java.util.concurrent.TimeoutException at org.apache.pulsar.client.admin.internal.TopicsImpl.getStats(TopicsImpl.java:419) ~[org.apache.pulsar-pulsar-client-admin-original-2.4.1.jar:2.4.1] at org.apache.pulsar.functions.worker.MembershipManager.getCurrentMembership(MembershipManager.java:126) ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1] at org.apache.pulsar.functions.worker.MembershipManager.checkFailures(MembershipManager.java:174) ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1] at org.apache.pulsar.functions.worker.WorkerService.lambda$start$0(WorkerService.java:191) ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1] at org.apache.pulsar.functions.worker.ClusterServiceCoordinator.lambda$start$0(ClusterServiceCoordinator.java:72) ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_222] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_222] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_222] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222] Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) ~[?:1.8.0_222] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) ~[?:1.8.0_222] at org.apache.pulsar.client.admin.internal.TopicsImpl.getStats(TopicsImpl.java:412) ~[org.apache.pulsar-pulsar-client-admin-original-2.4.1.jar:2.4.1] ... 11 more 10:43:12.570 [cluster-service-coordinator-timer] ERROR org.apache.pulsar.functions.worker.ClusterServiceCoordinator - Cluster timer task membership-monitor failed with exception. java.lang.RuntimeException: org.apache.pulsar.client.admin.PulsarAdminException$TimeoutException: java.util.concurrent.TimeoutException at org.apache.pulsar.functions.worker.MembershipManager.getCurrentMembership(MembershipManager.java:130) ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1] at org.apache.pulsar.functions.worker.MembershipManager.checkFailures(MembershipManager.java:174) ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1] at org.apache.pulsar.functions.worker.WorkerService.lambda$start$0(WorkerService.java:191) ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1] at org.apache.pulsar.functions.worker.ClusterServiceCoordinator.lambda$start$0(ClusterServiceCoordinator.java:72) ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_222] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_222] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_222] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222] Caused by: org.apache.pulsar.client.admin.PulsarAdminException$TimeoutException: java.util.concurrent.TimeoutException at org.apache.pulsar.client.admin.internal.TopicsImpl.getStats(TopicsImpl.java:419) ~[org.apache.pulsar-pulsar-client-admin-original-2.4.1.jar:2.4.1] at org.apache.pulsar.functions.worker.MembershipManager.getCurrentMembership(MembershipManager.java:126) ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1] ... 10 more Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) ~[?:1.8.0_222] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) ~[?:1.8.0_222] at org.apache.pulsar.client.admin.internal.TopicsImpl.getStats(TopicsImpl.java:412) ~[org.apache.pulsar-pulsar-client-admin-original-2.4.1.jar:2.4.1] at org.apache.pulsar.functions.worker.MembershipManager.getCurrentMembership(MembershipManager.java:126) ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1] ... 10 more ``` Now that the server is in this state, it seems completely wedged. Even `apache-pulsar-2.4.1/bin/pulsar-admin functions list` times out. There are some errors reported on pulsar output which may or may not be related: [gist](https://gist.github.com/candlerb/f984bc00c8e8a020a5993b83d2e13293). `apache-pulsar-2.4.1/bin/pulsar-admin topics list public/default` also times out. While it was waiting, I hit ctrl-C on the server, and the backtrace is in this [gist](https://gist.github.com/candlerb/688db64659d5b248e3fa1e0874a2d8dd). After restarting the server, it seems happy. I can list topics, successfully create function, delete function, and repeat. So actually reproducing this again may be harder. I hope the backtraces contain some useful information about what went wrong.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
