hozumi opened a new issue #12036:
URL: https://github.com/apache/pulsar/issues/12036


   Hi,
   I asked about the high CPU usage of my brokers on the pulsar slack channel 
several months ago, which I cannot see that post now.
   I just want to share that I solved the problem by changing producer's batch 
configuration properly.
   
   I thought that I had already enabled batching, but I did set the following 
wrong configuration.
   
   1. 3000 micro seconds batch duration instead of 3000 ms.
   
   ```
       .batchingMaxPublishDelay(3000, TimeUnit.MICROSECONDS)
   ```
   Yeah, this is silly mistake.
   Also It should be note that the default value of batchingMaxPublishDelay is 
`1ms` , which will have no batching effects, I think.
   
   2. Unnecessary KEY_BASED BatcherBuilder
   ```
      .batcherBuilder(BatcherBuilder.KEY_BASED)
    ```
   I somehow thought that `BatcherBuilder.KEY_BASED` is necessary in order to 
send messages with the same key into a particular partition.
   A batch made with KEY_BASED only contains messages with the same key, which 
result in massive 1 message batches in my use case.
    ```
    Key based batch message container
    incoming single messages:
       (k1, v1), (k2, v1), (k3, v1), (k1, v2), (k2, v2), (k3, v2), (k1, v3), 
(k2, v3), (k3, v3)
    batched into multiple batch messages:
       [(k1, v1), (k1, v2), (k1, v3)], [(k2, v1), (k2, v2), (k2, v3)], [(k3, 
v1), (k3, v2), (k3, v3)]
    ```
    
   As the partitioned producer in the default routing-mode does assign message 
to a particular partition, I don't need `BatcherBuilder.KEY_BASED` for my use 
cases.
   https://pulsar.apache.org/docs/en/admin-api-topics/#routing-mode
   > RoundRobinPartition
   > If a key is specified on the message, the partitioned producer hashes the 
key and assigns message to a particular partition. This is the default mode.
   
   For those who encounter the similar performance problem, I will recommend 
you to check the actual number of batched messages by cli such as 
`examine-messages` , `peek-messages` and `get-message-by-id`.
   You can see number of batched messages as `X-Pulsar-num-batch-message`.
   
   ```
   $ docker exec -it pulsar_broker bin/pulsar-admin topics examine-messages 
--initialPosition latest 
"persistent://mytenant/mynamespace/mytopic-partition-0" | head
   Message ID: 4572594:27489
   Tenants:
   "X-Pulsar-batch-size    23678"
   "X-Pulsar-num-batch-message    48"
   ...
   $ docker exec -it pulsar_broker bin/pulsar-admin topics get-message-by-id 
--ledgerId 4572594 --entryId 27489 
"persistent://mytenant/mynamespace/mytopic-partition-0"
   Batch Message ID: 4572594:27489:0
   Properties:
   "X-Pulsar-batch-size    23678"
   "X-Pulsar-num-batch-message    48"
   ...
   $ docker exec -it pulsar_broker bin/pulsar-admin topics peek-messages 
--subscription mysub1 --count 1 
"persistent://mytenant/mynamespace/mytopic-partition-0" | head
   Batch Message ID: 4572594:33046:0
   Publish time: 1631608014336
   Event time: 0
   Properties:
   "X-Pulsar-batch-size    20086"
   "X-Pulsar-num-batch-message    43"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to