heesung-sn opened a new pull request, #19343:
URL: https://github.com/apache/pulsar/pull/19343

   <!--
   ### Contribution Checklist
     
     - PR title format should be *[type][component] summary*. For details, see 
*[Guideline - Pulsar PR Naming 
Convention](https://docs.google.com/document/d/1d8Pw6ZbWk-_pCKdOmdvx9rnhPiyuxwq60_TrD68d7BA/edit#heading=h.trs9rsex3xom)*.
 
   
     - Fill out the template below to describe the changes contributed by the 
pull request. That will give reviewers the context they need to do the review.
     
     - Each pull request should address only one issue, not mix up code from 
multiple issues.
     
     - Each commit in the pull request has a meaningful commit message
   
     - Once all items of the checklist are addressed, remove the above text and 
this checklist, leaving only the filled out template below.
   -->
   
   
   
   ### Motivation
   
   Related to https://github.com/apache/pulsar/pull/19341,
   
   - When there are many concurrent subscriptions in a topic(10k> ), brokers' 
performance degrades because `checkBackloggedCursors()` iterates all 
subscriptions for each subscribe call.
   
   ```
   "pulsar-io-4-2" #77 prio=5 os_prio=0 cpu=17289.90ms elapsed=170.09s 
tid=0x00007f85b0085800 nid=0xa2 runnable  [0x00007f859d05e000]
      java.lang.Thread.State: RUNNABLE
           at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.deactivateCursor(ManagedLedgerImpl.java:3526)
           - locked <0x00007f8921b24c58> (a 
org.apache.bookkeeper.mledger.impl.ManagedCursorContainer)
           at 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.setInactive(ManagedCursorImpl.java:1116)
           at 
org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$checkBackloggedCursors$95(PersistentTopic.java:2363)
           at 
org.apache.pulsar.broker.service.persistent.PersistentTopic$$Lambda$1346/0x00007f8465c23cb0.accept(Unknown
 Source)
           at 
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:544)
           at 
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:272)
   --
           at java.lang.Thread.run([email protected]/Thread.java:829)
   
   "pulsar-io-4-3" #78 prio=5 os_prio=0 cpu=17224.94ms elapsed=170.09s 
tid=0x00007f85b0087000 nid=0xa3 waiting for monitor entry  [0x00007f859cf5d000]
      java.lang.Thread.State: BLOCKED (on object monitor)
           at 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.deactivateCursor(ManagedLedgerImpl.java:3526)
           - waiting to lock <0x00007f8921b24c58> (a 
org.apache.bookkeeper.mledger.impl.ManagedCursorContainer)
           at 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.setInactive(ManagedCursorImpl.java:1116)
           at 
org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$checkBackloggedCursors$95(PersistentTopic.java:2363)
           at 
org.apache.pulsar.broker.service.persistent.PersistentTopic$$Lambda$1346/0x00007f8465c23cb0.accept(Unknown
 Source)
           at 
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:544)
           at 
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:272)
   --
           at java.lang.Thread.run([email protected]/Thread.java:829)
   ```
   
   - Running `checkBackloggedCursors()` in the subscription path is redundant 
as `checkBackloggedCursors()` is called in `updateStats()` every 
`statsUpdateFrequencyInSecs`(by default 1min).
   
   
   ### Modifications
   
   Removed `checkBackloggedCursors()` from `subscribe()` call.
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   
   This change can be verified as follows:
   
   **simulate many concurrent subscriptions** 
   ```
   #!/bin/sh
   
   while :
   do
      bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp 
Earliest -time 3 -ss a &
      bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp 
Earliest  -time 3 -ss b &
      bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp 
Earliest  -time 3 -ss c &
      bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp 
Earliest  -time 3 -ss d &
      bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp 
Earliest  -time 3 -ss e &
      bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp 
Earliest  -time 3 -ss f &
      bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp 
Earliest  -time 3 -ss g &
      bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp 
Earliest  -time 3 -ss h &
      bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp 
Earliest  -time 3 -ss i &
      bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp 
Earliest  -time 3 -ss j &
      bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp 
Earliest  -time 3 -ss k &
      bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp 
Earliest  -time 3 -ss l &
      bin/pulsar-perf consume my-topic -r 10 -ns 10000 -st Exclusive -sp 
Earliest  -time 3 -ss m &
   
      for job in `jobs -p`
        do
          echo $job
          wait $job
        done
   done
   ```
   
   
   
   **before this PR**
   ```
   bin/pulsar-perf produce my-topic -r 10 -m 1000
   2023-01-27T18:47:30,430-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:       0 
msg ---      0.0 msg/s ---      0.0 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean:   0.000 ms - med:   0.000 - 95pct:   0.000 - 99pct:   0.000 - 
99.9pct:   0.000 - 99.99pct:   0.000 - Max:   0.000
   2023-01-27T18:47:40,510-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     119 
msg ---     11.8 msg/s ---      0.1 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean: 6834.356 ms - med: 6834.815 - 95pct: 12128.831 - 99pct: 
12629.887 - 99.9pct: 12728.447 - 99.99pct: 12728.447 - Max: 12728.447
   2023-01-27T18:47:50,531-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     119 
msg ---      0.0 msg/s ---      0.0 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean:   0.000 ms - med:   0.000 - 95pct:   0.000 - 99pct:   0.000 - 
99.9pct:   0.000 - 99.99pct:   0.000 - Max:   0.000
   2023-01-27T18:48:00,555-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     119 
msg ---      0.0 msg/s ---      0.0 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean:   0.000 ms - med:   0.000 - 95pct:   0.000 - 99pct:   0.000 - 
99.9pct:   0.000 - 99.99pct:   0.000 - Max:   0.000
   2023-01-27T18:48:10,583-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     119 
msg ---      0.0 msg/s ---      0.0 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean:   0.000 ms - med:   0.000 - 95pct:   0.000 - 99pct:   0.000 - 
99.9pct:   0.000 - 99.99pct:   0.000 - Max:   0.000
   2023-01-27T18:48:20,616-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     119 
msg ---      0.0 msg/s ---      0.0 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean:   0.000 ms - med:   0.000 - 95pct:   0.000 - 99pct:   0.000 - 
99.9pct:   0.000 - 99.99pct:   0.000 - Max:   0.000
   2023-01-27T18:48:30,656-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     119 
msg ---      0.0 msg/s ---      0.0 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean:   0.000 ms - med:   0.000 - 95pct:   0.000 - 99pct:   0.000 - 
99.9pct:   0.000 - 99.99pct:   0.000 - Max:   0.000
   2023-01-27T18:48:40,683-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     119 
msg ---      0.0 msg/s ---      0.0 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean:   0.000 ms - med:   0.000 - 95pct:   0.000 - 99pct:   0.000 - 
99.9pct:   0.000 - 99.99pct:   0.000 - Max:   0.000
   2023-01-27T18:48:50,711-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     119 
msg ---      0.0 msg/s ---      0.0 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean:   0.000 ms - med:   0.000 - 95pct:   0.000 - 99pct:   0.000 - 
99.9pct:   0.000 - 99.99pct:   0.000 - Max:   0.000
   2023-01-27T18:49:00,734-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     119 
msg ---      0.0 msg/s ---      0.0 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean:   0.000 ms - med:   0.000 - 95pct:   0.000 - 99pct:   0.000 - 
99.9pct:   0.000 - 99.99pct:   0.000 - Max:   0.000
   2023-01-27T18:49:01,300-0800 [pulsar-perf-producer-exec-1-1] INFO  
org.apache.pulsar.testclient.PerformanceProducer - ------------- DONE (reached 
the maximum number: 1000 of production) --------------
   2023-01-27T18:49:06,306-0800 [Thread-0] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Aggregated throughput stats 
--- 119 records sent --- 1.123 msg/s --- 0.009 Mbit/s
   2023-01-27T18:49:06,337-0800 [Thread-0] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Aggregated latency stats --- 
Latency: mean: 6834.356 ms - med: 6834.815 - 95pct: 12128.831 - 99pct: 
12629.887 - 99.9pct: 12728.447 - 99.99pct: 12728.447 - 99.999pct: 12728.447 - 
Max: 12728.447
   
   ```
   
   **after this PR**
   ```
    bin/pulsar-perf produce my-topic -r 10 -m 1000
   
   2023-01-27T21:26:43,991-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:      15 
msg ---      1.5 msg/s ---      0.0 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean: 5465.266 ms - med: 6249.727 - 95pct: 7482.431 - 99pct: 7584.639 
- 99.9pct: 7584.639 - 99.99pct: 7584.639 - Max: 7584.639
   2023-01-27T21:26:54,025-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:      62 
msg ---      4.6 msg/s ---      0.0 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean: 9959.775 ms - med: 10424.703 - 95pct: 12068.927 - 99pct: 
12823.935 - 99.9pct: 12823.935 - 99.99pct: 12823.935 - Max: 12823.935
   2023-01-27T21:27:04,048-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:      97 
msg ---      3.5 msg/s ---      0.0 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean: 16149.440 ms - med: 16641.855 - 95pct: 19121.023 - 99pct: 
19414.143 - 99.9pct: 19414.143 - 99.99pct: 19414.143 - Max: 19414.143
   2023-01-27T21:27:14,070-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     131 
msg ---      3.4 msg/s ---      0.0 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean: 22474.266 ms - med: 22154.495 - 95pct: 24488.959 - 99pct: 
25234.559 - 99.9pct: 25234.559 - 99.99pct: 25234.559 - Max: 25234.559
   2023-01-27T21:27:24,106-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     192 
msg ---      6.1 msg/s ---      0.0 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean: 28824.695 ms - med: 28932.479 - 95pct: 30471.679 - 99pct: 
30507.007 - 99.9pct: 30516.479 - 99.99pct: 30516.479 - Max: 30516.479
   2023-01-27T21:27:34,143-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     269 
msg ---      7.7 msg/s ---      0.1 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean: 31316.609 ms - med: 30886.527 - 95pct: 32863.231 - 99pct: 
32963.199 - 99.9pct: 33045.759 - 99.99pct: 33045.759 - Max: 33045.759
   2023-01-27T21:27:44,178-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     418 
msg ---     14.9 msg/s ---      0.1 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean: 30904.110 ms - med: 31308.287 - 95pct: 33457.919 - 99pct: 
33696.511 - 99.9pct: 33730.303 - 99.99pct: 33730.303 - Max: 33730.303
   2023-01-27T21:27:54,205-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     792 
msg ---     37.3 msg/s ---      0.3 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean: 12566.494 ms - med: 11749.119 - 95pct: 25876.223 - 99pct: 
27032.319 - 99.9pct: 27429.375 - 99.99pct: 27429.375 - Max: 27429.375
   2023-01-27T21:28:04,231-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     892 
msg ---     10.0 msg/s ---      0.1 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean: 158.367 ms - med: 103.553 - 95pct: 512.813 - 99pct: 626.371 - 
99.9pct: 729.767 - 99.99pct: 729.767 - Max: 729.767
   2023-01-27T21:28:14,254-0800 [main] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:     992 
msg ---     10.0 msg/s ---      0.1 Mbit/s  --- failure      0.0 msg/s --- 
Latency: mean: 113.023 ms - med:  87.656 - 95pct: 275.551 - 99pct: 362.147 - 
99.9pct: 384.981 - 99.99pct: 384.981 - Max: 384.981
   2023-01-27T21:28:14,871-0800 [pulsar-perf-producer-exec-1-1] INFO  
org.apache.pulsar.testclient.PerformanceProducer - ------------- DONE (reached 
the maximum number: 1000 of production) --------------
   2023-01-27T21:28:19,879-0800 [Thread-0] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Aggregated throughput stats 
--- 1000 records sent --- 9.434 msg/s --- 0.074 Mbit/s
   2023-01-27T21:28:19,942-0800 [Thread-0] INFO  
org.apache.pulsar.testclient.PerformanceProducer - Aggregated latency stats --- 
Latency: mean: 15404.022 ms - med: 14796.287 - 95pct: 32638.591 - 99pct: 
33425.023 - 99.9pct: 33696.511 - 99.99pct: 33730.303 - 99.999pct: 33730.303 - 
Max: 33730.303
   ```
   
   
   ### Does this pull request potentially affect one of the following parts:
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   *If the box was checked, please highlight the changes*
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The threading model
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] The metrics
   - [ ] Anything that affects deployment
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc` <!-- Your PR contains doc changes. Please attach the local 
preview screenshots (run `sh start.sh` at `pulsar/site2/website`) to your PR 
description, or else your PR might not get merged. -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update 
later -->
   - [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->
   
   ### Matching PR in forked repository
   
   PR in forked repository: https://github.com/heesung-sn/pulsar/pull/24
   
   <!--
   After opening this PR, the build in apache/pulsar will fail and instructions 
will
   be provided for opening a PR in the PR author's forked repository.
   
   apache/pulsar pull requests should be first tested in your own fork since 
the 
   apache/pulsar CI based on GitHub Actions has constrained resources and quota.
   GitHub Actions provides separate quota for pull requests that are executed 
in 
   a forked repository.
   
   The tests will be run in the forked repository until all PR review comments 
have
   been handled, the tests pass and the PR is approved by a reviewer.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to