michalcukierman commented on issue #21074:
URL: https://github.com/apache/pulsar/issues/21074#issuecomment-1695541238

   @coderzc It's not a problem with duplicated messages.
   Check the 
[comment](https://github.com/apache/pulsar/issues/21074#issuecomment-1694749730),
 I put some numbers there.
   - Produced messages: 60k
   - Messages in compacted ledger: 87051
   - Consumed messages 127k and growing (I stopped at 250k)
   
   
   So the issue is not about duplicated messages, but the redelivery loop. 
Sometimes messages loss.
   In both cases the backlog (Grafana and topic stats) does not change during 
the time.
   
   Also, If you have duplicated messages in compacted ledger, you would expect 
different keys, right?
   I get the same key, from the same ledger over and over:
   ```
   # 306  ->  (144,0,1,-1)
   # 5500  ->  (144,0,1,-1)
   # 8862  ->  (144,0,1,-1)
   # 11079  ->  (144,0,1,-1)
   # 12825  ->  (144,0,1,-1)
   # 15426  ->  (144,0,1,-1)
   # 17804  ->  (144,0,1,-1)
   # 19672  ->  (144,0,1,-1)
   # 23374  ->  (144,0,1,-1)
   ```
   
   so just with the example above I read `23374` messages during a single run 
probably around 100k during last hour.
   The compacted ledger entries is exactly 10k:
   ```bash 
   ➜  chaos-test-pulsar-perf git:(main) ✗ for i in {0..12} ; do kubectl exec 
--namespace pulsar -t pulsar-toolset-0 -- bin/pulsar-admin topics 
stats-internal  test-compaction-0-partition-$i |  jq -r 
".compactedLedger.entries" ;done 
   851
   835
   816
   833
   803
   824
   823
   837
   823
   842
   843
   870
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to