hozumi opened a new issue #12036:
URL: https://github.com/apache/pulsar/issues/12036
Hi,
I asked about the high CPU usage of my brokers on the pulsar slack channel
several months ago, which I cannot see that post now.
I just want to share that I solved the problem by changing producer's batch
configuration properly.
I thought that I had already enabled batching, but I did set the following
wrong configuration.
1. 3000 micro seconds batch duration instead of 3000 ms.
```
.batchingMaxPublishDelay(3000, TimeUnit.MICROSECONDS)
```
Yeah, this is silly mistake.
Also It should be note that the default value of batchingMaxPublishDelay is
`1ms` , which will have no batching effects, I think.
2. Unnecessary KEY_BASED BatcherBuilder
```
.batcherBuilder(BatcherBuilder.KEY_BASED)
```
I somehow thought that `BatcherBuilder.KEY_BASED` is necessary in order to
send messages with the same key into a particular partition.
A batch made with KEY_BASED only contains messages with the same key, which
result in massive 1 message batches in my use case.
```
Key based batch message container
incoming single messages:
(k1, v1), (k2, v1), (k3, v1), (k1, v2), (k2, v2), (k3, v2), (k1, v3),
(k2, v3), (k3, v3)
batched into multiple batch messages:
[(k1, v1), (k1, v2), (k1, v3)], [(k2, v1), (k2, v2), (k2, v3)], [(k3,
v1), (k3, v2), (k3, v3)]
```
As the partitioned producer in the default routing-mode does assign message
to a particular partition, I don't need `BatcherBuilder.KEY_BASED` for my use
cases.
https://pulsar.apache.org/docs/en/admin-api-topics/#routing-mode
> RoundRobinPartition
> If a key is specified on the message, the partitioned producer hashes the
key and assigns message to a particular partition. This is the default mode.
For those who encounter the similar performance problem, I will recommend
you to check the actual number of batched messages by cli such as
`examine-messages` , `peek-messages` and `get-message-by-id`.
You can see number of batched messages as `X-Pulsar-num-batch-message`.
```
$ docker exec -it pulsar_broker bin/pulsar-admin topics examine-messages
--initialPosition latest
"persistent://mytenant/mynamespace/mytopic-partition-0" | head
Message ID: 4572594:27489
Tenants:
"X-Pulsar-batch-size 23678"
"X-Pulsar-num-batch-message 48"
...
$ docker exec -it pulsar_broker bin/pulsar-admin topics get-message-by-id
--ledgerId 4572594 --entryId 27489
"persistent://mytenant/mynamespace/mytopic-partition-0"
Batch Message ID: 4572594:27489:0
Properties:
"X-Pulsar-batch-size 23678"
"X-Pulsar-num-batch-message 48"
...
$ docker exec -it pulsar_broker bin/pulsar-admin topics peek-messages
--subscription mysub1 --count 1
"persistent://mytenant/mynamespace/mytopic-partition-0" | head
Batch Message ID: 4572594:33046:0
Publish time: 1631608014336
Event time: 0
Properties:
"X-Pulsar-batch-size 20086"
"X-Pulsar-num-batch-message 43"
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]