candlerb opened a new issue #5652: Server wedged after deleting and recreating 
functions
URL: https://github.com/apache/pulsar/issues/5652
 
 
   **Describe the bug**
   I managed to get pulsar (standalone 2.4.1) into a wedged state reporting 
bookkeeper errors after deleting and recreating the same function.
   
   **To Reproduce**
   
   ```
   $ cat womble.yaml
   tenant: public
   namespace: default
   name: womble
   py: /home/ubuntu/func1.py
   className: func1.FirstFunction
   inputs: [my-topic]
   userConfig:
     wibble: bibble
   
   $ cat func1.py
   from pulsar import Function
   
   class FirstFunction(Function):
       def process(self, item, context):
           log = context.get_logger()
           log.info("(v%r) Got %r with properties %r" % 
(context.get_function_version(), item, context.get_message_properties()))
   
   $ apache-pulsar-2.4.1/bin/pulsar-admin functions create 
--function-config-file /home/ubuntu/womble.yaml
   ...
   $ apache-pulsar-2.4.1/bin/pulsar-admin functions delete --name womble
   ```
   
   and repeat.
   
   **Expected behavior**
   Obviously, should be able to delete and recreate the function as many times 
as required.
   
   **Actual behavior**
   I got into a state where pulsar commands are hanging, and timing out after 
60 seconds.
   
   tcpdump for pulsar-admin create is in [this 
gist](https://gist.github.com/candlerb/e0d926ce3488f762fff91c8f1ddd8a4f).  
Shows repeated `org.apache.bookkeeper.mledger.ManagedLedgerException: 
ManagedCursor not found: public%2Fdefault%2Fwomble`
   
   With pulsar-admin delete, tcpdump shows nothing useful - it just times out. 
However, in the foreground output of the pulsar server itself, I see:
   
   ```
   10:43:12.569 [cluster-service-coordinator-timer] ERROR 
org.apache.pulsar.functions.worker.MembershipManager - Failed to get status of 
coordinate topic persistent://public/functions/coordinate
   org.apache.pulsar.client.admin.PulsarAdminException$TimeoutException: 
java.util.concurrent.TimeoutException
           at 
org.apache.pulsar.client.admin.internal.TopicsImpl.getStats(TopicsImpl.java:419)
 ~[org.apache.pulsar-pulsar-client-admin-original-2.4.1.jar:2.4.1]
           at 
org.apache.pulsar.functions.worker.MembershipManager.getCurrentMembership(MembershipManager.java:126)
 ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
           at 
org.apache.pulsar.functions.worker.MembershipManager.checkFailures(MembershipManager.java:174)
 ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
           at 
org.apache.pulsar.functions.worker.WorkerService.lambda$start$0(WorkerService.java:191)
 ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
           at 
org.apache.pulsar.functions.worker.ClusterServiceCoordinator.lambda$start$0(ClusterServiceCoordinator.java:72)
 ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_222]
           at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[?:1.8.0_222]
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_222]
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [?:1.8.0_222]
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_222]
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_222]
           at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
   Caused by: java.util.concurrent.TimeoutException
           at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) 
~[?:1.8.0_222]
           at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) 
~[?:1.8.0_222]
           at 
org.apache.pulsar.client.admin.internal.TopicsImpl.getStats(TopicsImpl.java:412)
 ~[org.apache.pulsar-pulsar-client-admin-original-2.4.1.jar:2.4.1]
           ... 11 more
   10:43:12.570 [cluster-service-coordinator-timer] ERROR 
org.apache.pulsar.functions.worker.ClusterServiceCoordinator - Cluster timer 
task membership-monitor failed with exception.
   java.lang.RuntimeException: 
org.apache.pulsar.client.admin.PulsarAdminException$TimeoutException: 
java.util.concurrent.TimeoutException
           at 
org.apache.pulsar.functions.worker.MembershipManager.getCurrentMembership(MembershipManager.java:130)
 ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
           at 
org.apache.pulsar.functions.worker.MembershipManager.checkFailures(MembershipManager.java:174)
 ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
           at 
org.apache.pulsar.functions.worker.WorkerService.lambda$start$0(WorkerService.java:191)
 ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
           at 
org.apache.pulsar.functions.worker.ClusterServiceCoordinator.lambda$start$0(ClusterServiceCoordinator.java:72)
 ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_222]
           at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[?:1.8.0_222]
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_222]
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [?:1.8.0_222]
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_222]
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_222]
           at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
   Caused by: 
org.apache.pulsar.client.admin.PulsarAdminException$TimeoutException: 
java.util.concurrent.TimeoutException
           at 
org.apache.pulsar.client.admin.internal.TopicsImpl.getStats(TopicsImpl.java:419)
 ~[org.apache.pulsar-pulsar-client-admin-original-2.4.1.jar:2.4.1]
           at 
org.apache.pulsar.functions.worker.MembershipManager.getCurrentMembership(MembershipManager.java:126)
 ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
           ... 10 more
   Caused by: java.util.concurrent.TimeoutException
           at 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) 
~[?:1.8.0_222]
           at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) 
~[?:1.8.0_222]
           at 
org.apache.pulsar.client.admin.internal.TopicsImpl.getStats(TopicsImpl.java:412)
 ~[org.apache.pulsar-pulsar-client-admin-original-2.4.1.jar:2.4.1]
           at 
org.apache.pulsar.functions.worker.MembershipManager.getCurrentMembership(MembershipManager.java:126)
 ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
           ... 10 more
   ```
   
   Now that the server is in this state, it seems completely wedged.  Even 
`apache-pulsar-2.4.1/bin/pulsar-admin functions list` times out.  There are 
some errors reported on pulsar output which may or may not be related: 
[gist](https://gist.github.com/candlerb/f984bc00c8e8a020a5993b83d2e13293).
   
   `apache-pulsar-2.4.1/bin/pulsar-admin topics list public/default` also times 
out.  While it was waiting, I hit ctrl-C on the server, and the backtrace is in 
this [gist](https://gist.github.com/candlerb/688db64659d5b248e3fa1e0874a2d8dd).
   
   After restarting the server, it seems happy.  I can list topics, 
successfully create function, delete  function, and repeat.
   
   So actually reproducing this again may be harder.  I hope the backtraces 
contain some useful information about what went wrong.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to