codelipenghui commented on PR #20649: URL: https://github.com/apache/pulsar/pull/20649#issuecomment-1624828400
I have tried to reproduce the issue in the last two days. It looks like the killer of the publish latency when the broker trimming many Ledgers is the Zookeeper operation latency because all the pending add entry requests need to wait for the new Ledger creation. I don't see the trim operation exhausting the CPU resource or the contention introduced by the trim operation. #### Test Steps The test is running or 3 brokers and 3 bookies, 10 CPU cores assigned to the broker, and 6 cores assigned to the bookie. I have changed the max ledger rollover time to 2 min to make sure each topic will have enough ledgers. 1. Start a test to write and read messages on 30k topics (Non-partitioned topic) 2. Set the data retention for 2 hours 3. Waiting for 2 hours to make sure we have many ledgers will be trimmed 4. Set the data retention to 2 mins after 2 hours 5. Check the dashboard, the publish latency has been impacted. The P99 publish latency goes up to 15s during the Ledger trimming.  The Zookeeper update latency also goes up to 15s during the Leger trimming.  For each bookie, there are almost 1 million Ledgers have been deleted.  No obvious broker CPU spike  And the flame graph is made during the ledger trimming. [trim_cpu_0707.html.txt](https://github.com/apache/pulsar/files/11975299/trim_cpu_0707.html.txt) [trim_lock_0707.html.txt](https://github.com/apache/pulsar/files/11975300/trim_lock_0707.html.txt) #### Conclusion I doubt this PR can resolve the publish latency spike issue during the ledger trimming. From the above test, I don't see the bottleneck is from contention or CPU exhaustion. Maybe if you have a very good performance zookeeper, the zookeeper will not be the bottleneck. Then you can figure out other bottlenecks. But the zookeeper reached 8.5k ops per second. I don't think zookeeper is designed for such high-performance requirements  Another solution is to introduce a throttling mechanism for the trim operation. So that we can prevent a huge impact on the cluster when trimming ledgers. It's not too big a problem for users to trim the ledgers later but it can make the trim operation smoothly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
