fracasula opened a new issue #7682:
URL: https://github.com/apache/pulsar/issues/7682


   **Describe the bug**
   A consumer gets stuck after seeing an error mostly probably coming from the 
cgo layer underneath: 
   
   ```
   terminate called after throwing an instance of 'std::bad_alloc'
   what(): std::bad_alloc
   ```
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. Shell into Kubernetes pod and reset the cursor for a given subscription 
like `bin/pulsar-admin topics reset-cursor 
persistent://public/default/SpaceEvents -s cloud-notifications-service -t 999w`
   2. The consumer is able to read a few messages and then eventually fails 
with the above error. It doesn't seem to be trying anything (e.g. reconnection, 
termination...), it just gets stuck
   3. If we terminate the service manually it then resumes consuming and then 
after a while it eventually gets stuck again
   
   **Expected behavior**
   I would expect it to not block and to not raise any `bad alloc` error.
   
   **Screenshots**
   No screenshots available.
   
   **Desktop (please complete the following information):**
    - OS: Kubernetes on GCP
   
   **Additional context**
   I cross referenced the logs of our consumer to see what happens on the 
Pulsar side when we get the `bad alloc` errors and we were able to find some 
interesting exceptions that seem to happen concomitantly with the `bad alloc` 
errors (see attached report).
   
   Some errors are particularly interesting and make me think that we might 
have issues when reading entries from a ledger (bookkeeper). Is there anything 
you can suggest on how to better debug this? Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to