[GitHub] [pulsar] michalcukierman commented on issue #21074: [Bug] Compaction failure leads to messages loss and messages redelivery loop

via GitHub Sun, 27 Aug 2023 13:06:37 -0700


michalcukierman commented on issue #21074:
URL: https://github.com/apache/pulsar/issues/21074#issuecomment-1694749730


   I was able to reproduce it with the topic with 60k messages.
   
   
   1. Create the topic, run the compaction (to create subscription) 
   ```bash
   TOPIC="test-compaction"
   kubectl exec --namespace pulsar -t pulsar-toolset-0 -- bin/pulsar-admin 
topics create-partitioned-topic $TOPIC -p 12
   kubectl exec --namespace pulsar -t pulsar-toolset-0 -- bin/pulsar-admin 
topics compact $TOPIC
   kubectl exec --namespace pulsar -t pulsar-toolset-0 -- bin/pulsar-perf 
produce -bm 1 -r 300 -m 10000 -s 102400 $TOPIC
   ```
   2. Connect with the client:
   ```python
   
   import pulsar
   
   client = pulsar.Client('pulsar://localhost:6650')
   consumer = client.subscribe('test-compaction',
                               subscription_name='python-consume',
                               initial_position=pulsar.InitialPosition.Earliest,
                               is_read_compacted=True)
   
   count=0
   while True:
       msg = consumer.receive()
       count += 1
       print(count)
       consumer.acknowledge(msg)
   
   client.close()
   ```
   
    by invoking:
   ```bash
   kubectl port-forward service/pulsar-proxy  6650:6650 -n pulsar & 
   python3 consumer.py 
   ```
   
   3. Run the compaction
   
   ```bash
   kubectl exec --namespace pulsar -t pulsar-toolset-0 -- bin/pulsar-admin 
topics compact ${TOPIC}
   ```
   
   4. Unlaod the topicc
   
   ```bash
   kubectl exec --namespace pulsar -t pulsar-toolset-0 -- bin/pulsar-admin 
topics unload ${TOPIC}
   ```
   
   The result:
   - python client fall in infinite loop, I've stopped it after receiving 
`127557` messages.
   - after restarting the client, it still receives new messages
   
   Other information:
   - Pulsar Manager showing `out rate` and `precise backlog`:
   <img width="1474" alt="Screenshot 2023-08-27 at 21 53 33" 
src="https://github.com/apache/pulsar/assets/4356553/a11da3b8-8680-4845-b7f1-0043c63e722f";>
   
   - Messages backlog is not from Gradana (last 10 minutes without changes)
   <img width="839" alt="Screenshot 2023-08-27 at 21 58 23" 
src="https://github.com/apache/pulsar/assets/4356553/cbd18ae9-b5e8-4937-9021-464b309ea821";>
   
   - Compacted ledger entries (87051):
   ```bash
   ➜  chaos-test-pulsar-perf git:(main) ✗ for i in {0..12} ; do kubectl exec 
--namespace pulsar -t pulsar-toolset-0 -- bin/pulsar-admin topics 
stats-internal  test-compaction-2-partition-$i |  jq -r 
".compactedLedger.entries" ;done 
   8718
   12692
   8486
   5113
   11155
   5077
   5113
   9513
   5147
   5151
   5147
   5739
   ```
   
   I used topic unloading just to reproduce the issue. In our case it could be 
cuased by:
   - during broker restart on OOM on Direct Memory
   - during brokers scaling (i.e. adding broker)
   - topics re-balancing (bundles spliting)
   
   I see this issue a lot during broker restart.
   Sometimes we observe it without the restarts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [pulsar] michalcukierman commented on issue #21074: [Bug] Compaction failure leads to messages loss and messages redelivery loop

Reply via email to