[
https://issues.apache.org/jira/browse/KAFKA-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shenwenbing updated KAFKA-10672:
--------------------------------
Attachment: server.log
> Restarting Kafka always takes a lot of time
> -------------------------------------------
>
> Key: KAFKA-10672
> URL: https://issues.apache.org/jira/browse/KAFKA-10672
> Project: Kafka
> Issue Type: Improvement
> Components: core
> Affects Versions: 2.0.0
> Environment: A cluster of 21 Kafka nodes;
> Each node has 12 disks;
> Each node has about 1500 partitions;
> There are approximately 700 leader partitions per node;
> Slow-loading partitions have about 1000 log segments;
> Reporter: shenwenbing
> Priority: Major
> Attachments: server.log
>
>
> When the snapshot file does not exist, or the latest snapshot file before the
> current active period, restoring the state of producers will traverse the log
> section, it will traverse the log all batch, in the period when the
> individual broker node partition number many, that there are most of the
> number of logs, can cause a lot of IO number, IO will only load one batch at
> a time, such as a log there will always be in the tens of thousands of batch,
> I found that in the code for each batch are at least two IO operation, when a
> batch as the default 16 KB,When a log segment is 1G, 65,536 batches will be
> generated, and then at least 65,536 *2= 131,072 IO operations will be
> generated, which will lead to a lot of time spent in kafka startup process.
> We configured 15 log recovery threads in the production environment, and it
> still took more than 2 hours to load a partition,can community puts forward
> some proposals to the situation or improve.For detailed logs, see the section
> on test-perf-18 partitions in the nearby logs
--
This message was sent by Atlassian Jira
(v8.3.4#803005)