[GitHub] [pulsar] poorbarcode commented on a diff in pull request #20649: [improve][broker] Avoid producer latency rise during internal ledger trimming operations

via GitHub Mon, 03 Jul 2023 07:59:30 -0700


poorbarcode commented on code in PR #20649:
URL: https://github.com/apache/pulsar/pull/20649#discussion_r1251003645



##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java:
##########
@@ -2451,15 +2451,15 @@ private void trimConsumedLedgersInBackground() {
 
     @Override
     public void trimConsumedLedgersInBackground(CompletableFuture<?> promise) {
-        executor.execute(() -> internalTrimConsumedLedgers(promise));
+        scheduledExecutor.execute(() -> internalTrimConsumedLedgers(promise));
     }
 
     public void trimConsumedLedgersInBackground(boolean isTruncate, 
CompletableFuture<?> promise) {

Review Comment:
   @315157973 
   
   > Assuming that one trim task increase 0.01ms, if there are tens of 
thousands of partitions, it will increase a lot.
   Especially when there are many small packets sent to the cluster， this 
situation will get worse
   
   I agree with you. But I'm wondering if the current changes will solve the 
issue.
   ![截屏2023-07-03 22 29 
13](https://github.com/apache/pulsar/assets/25195800/d681cb43-bc05-4a4e-ab46-1a0c6892eea2)
   
   In the picture above, the logic of `trim ledgers` can be split into two 
sections: Verification(not in the lock block) and Execution(in the lock block), 
and there have some async tasks in Section Execution which will run in other 
threads. The three tasks execute as the flow below.
   
   | Without lock | In lock | In other threads |
   | --- | --- | --- |
   | Verification | Memory changes | Persist changes |
   
   Since there is the lock `synchronized(ml)`, we can only reduce the executor 
thread cost of the task `Verification`, which is very simple. It only tries to 
acquire the lock and is done if it fails.
   
   **(Highlight)** However, this change also makes the scramble of the lock 
`synchronized(ml)` more frequent, which can worsen performance.
   
   For example, the `topic-a` and `topic-b` were assigned the same thread: 
`executor-1.` And the task `trim ledgers` will be executed in the thread 
`scheduled executor`, these tasks will be executed as flow below
   
   | time |`executor-1` | `scheduled executor` |
   | --- | --- | --- |
   | 1 | publish of `topic-a` |
   | 2 | publish of `topic-b` |
   | 3 | publish of `topic-b` | start `trim ledgers` of `topic-a` |
   | 4 | publish of `topic-b` | get the ML lock of `topic-a` |
   | 5 | publish of `topic-b` |
   | 6 | publish of `topic-a` |
   | 7 | waiting the lock, all publish of `topic-a` and `topic-b` will be 
blocked util the task `trim ledgers` of `topic-a` is complete |
   
   So in step-4, if the task `trim ledgers` of `topic-a` is executed, the 
performance will be better; if the task `trim ledgers` of `topic-a` is 
executed, the performance will be worse. If more and more topics are assigned 
to the same thread, the performance will be better(maybe we can add a config to 
disable this feature, it is helpful for the users who have few topics).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [pulsar] poorbarcode commented on a diff in pull request #20649: [improve][broker] Avoid producer latency rise during internal ledger trimming operations

Reply via email to