ckdarby opened a new issue #7058: URL: https://github.com/apache/pulsar/issues/7058
**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: 1. We're using Pulsar helm install on AWS EKS https://github.com/apache/pulsar-helm-chart/commit/6e9ad25ba322f6f0fc7c11c66fb88faa6d0218db 2. Our values.yaml overrides look like this: ```yaml pulsar: namespace: cory-ebs-test components: pulsar_manager: false # UI is outdated and won't load without errors auth: authentication: enabled: true bookkeeper: resources: requests: memory: 11560Mi cpu: 1.5 volumes: journal: size: 100Gi ledgers: size: 5Ti configData: # `BOOKIE_MEM` is used for `bookie shell` BOOKIE_MEM: > " -Xms1280m -Xmx10800m -XX:MaxDirectMemorySize=10800m " # we use `bin/pulsar` for starting bookie daemons PULSAR_MEM: > " -Xms10800m -Xmx10800m -XX:MaxDirectMemorySize=10800m " # configure the memory settings based on jvm memory settings dbStorage_writeCacheMaxSizeMb: "2500" #pulsar docs say 25% dbStorage_readAheadCacheMaxSizeMb: "2500" #pulsar docs say 25% dbStorage_rocksDB_writeBufferSizeMB: "64" #pulsar docs had 64 dbStorage_rocksDB_blockCacheSize: "1073741824" #pulsar docs say 10% readBufferSizeBytes: "8096" #attempted doubling autorecovery: resources: requests: memory: 2048Mi cpu: 1 configData: BOOKIE_MEM: > " -Xms1500m -Xmx1500m " broker: resources: requests: memory: 4096Mi cpu: 1 configData: PULSAR_MEM: > " -Xms1024m -Xmx4096m -XX:MaxDirectMemorySize=4096m -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=4 -XX:ConcGCThreads=4 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem " proxy: resources: requests: memory: 4096Mi cpu: 1 configData: PULSAR_MEM: > " -Xms1024m -Xmx4096m -XX:MaxDirectMemorySize=4096m -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.linkCapacity=1024 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=4 -XX:ConcGCThreads=4 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -XX:+ExitOnOutOfMemoryError -XX:+PerfDisableSharedMem " service: annotations: service.beta.kubernetes.io/aws-load-balancer-type: nlb external-dns.alpha.kubernetes.io/hostname: pulsar.internal.ckdarby toolset: resources: requests: memory: 1028Mi cpu: 1 configData: PULSAR_MEM: > " -Xms640m -Xmx1028m -XX:MaxDirectMemorySize=1028m " grafana: service: annotations: external-dns.alpha.kubernetes.io/hostname: grafana.internal.ckdarby admin: user: admin password: 12345 ``` 3. Produce message to multi-partioned topic: - Partitioned by 8 - Average message size is ~1.5 KB - Set retention as 7 days - We're storing ~ 2-8 TB of retention at times 4. Attempt to consume message with the offset set as earliest (thus skipping any rocksdb read cache, going to the backlog): Have tried Flink Pulsar connector Running with the Pulsar's perf reader from the toolset pod on a single partition topic ```json { "confFile" : "/pulsar/conf/client.conf", "topic" : [ "persistent://public/cory/test-ebs-partition-5" ], "numTopics" : 1, "rate" : 0.0, "startMessageId" : "earliest", "receiverQueueSize" : 1000, "maxConnections" : 100, "statsIntervalSeconds" : 0, "serviceURL" : "pulsar://cory-ebs-test-pulsar-proxy:6650/", "authPluginClassName" : "org.apache.pulsar.client.impl.auth.AuthenticationToken", "authParams" : "file:///pulsar/tokens/client/token", "useTls" : false, "tlsTrustCertsFilePath" : "" } ``` 5. Check Grafana, EBS graphs, etc - See really poor performance from Pulsar, 60-100 mbyte/s on the partition - Don't see any bottlenecks **Expected behavior** Pulsar is getting 60-100 mbyte/s reads off each partition. Would expect closer to what bookie is actually able to read off EBS at 200-300 mbyte/s **Additional context** Here is a real example of everything I could pull, perf reader starts at 18:31:17 UTC & ends at 18:46:37 UTC. All the graphs are during that time and in UTC. **Perf Reader Output*** ```text 18:31:17.389 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 58250.685 msg/s -- 647.672 Mbit/s 18:31:27.389 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 58523.641 msg/s -- 667.659 Mbit/s 18:31:37.390 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 61314.984 msg/s -- 688.519 Mbit/s 18:31:47.390 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 64920.905 msg/s -- 748.406 Mbit/s 18:31:57.390 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 64340.229 msg/s -- 732.601 Mbit/s ... 18:42:17.416 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 64034.036 msg/s -- 723.160 Mbit/s 18:42:27.419 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 63048.031 msg/s -- 700.458 Mbit/s 18:42:37.421 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 69958.533 msg/s -- 817.095 Mbit/s 18:42:47.422 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 69898.133 msg/s -- 827.770 Mbit/s 18:42:57.422 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 62989.179 msg/s -- 726.990 Mbit/s 18:43:07.422 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 63500.736 msg/s -- 728.683 Mbit/s ... 18:45:37.430 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 55052.395 msg/s -- 645.263 Mbit/s 18:45:47.431 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 72004.353 msg/s -- 804.856 Mbit/s 18:45:57.431 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 86224.170 msg/s -- 954.399 Mbit/s 18:46:07.431 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 80231.708 msg/s -- 905.096 Mbit/s 18:46:17.432 [main] INFO org.apache.pulsar.testclient.PerformanceReader - Read throughput: 73065.824 msg/s -- 864.556 Mbit/s ``` **Bookie reading directly from EBS** Flushed disk cache before & this is before running the perf reader  **EC2 instances** Amount: 13 Type: r5.large AZ: All in us-west-2c All within Kubernetes **EBS**  **Grafana Overview**  **JVM** Bookie  Broker  Recovery  Zookeeper  **Bookie**   **Specifically public/cory/test-ebs-partition-5**  ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
