michalcukierman commented on issue #21074: URL: https://github.com/apache/pulsar/issues/21074#issuecomment-1695541238
@coderzc It's not a problem with duplicated messages. Check the [comment](https://github.com/apache/pulsar/issues/21074#issuecomment-1694749730), I put some numbers there. - Produced messages: 60k - Messages in compacted ledger: 87051 - Consumed messages 127k and growing (I stopped at 250k) So the issue is not about duplicated messages, but the redelivery loop. Sometimes messages loss. In both cases the backlog (Grafana and topic stats) does not change during the time. Also, If you have duplicated messages in compacted ledger, you would expect different keys, right? I get the same key, from the same ledger over and over: ``` # 306 -> (144,0,1,-1) # 5500 -> (144,0,1,-1) # 8862 -> (144,0,1,-1) # 11079 -> (144,0,1,-1) # 12825 -> (144,0,1,-1) # 15426 -> (144,0,1,-1) # 17804 -> (144,0,1,-1) # 19672 -> (144,0,1,-1) # 23374 -> (144,0,1,-1) ``` so just with the example above I read `23374` messages during a single run probably around 100k during last hour. The compacted ledger entries is exactly 10k: ```bash ➜ chaos-test-pulsar-perf git:(main) ✗ for i in {0..12} ; do kubectl exec --namespace pulsar -t pulsar-toolset-0 -- bin/pulsar-admin topics stats-internal test-compaction-0-partition-$i | jq -r ".compactedLedger.entries" ;done 851 835 816 833 803 824 823 837 823 842 843 870 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
