shenwenbing created KAFKA-10672:
-----------------------------------

             Summary: Restarting Kafka always takes a lot of time
                 Key: KAFKA-10672
                 URL: https://issues.apache.org/jira/browse/KAFKA-10672
             Project: Kafka
          Issue Type: Improvement
          Components: core
    Affects Versions: 2.0.0
         Environment: A cluster of 21 Kafka nodes;
Each node has 12 disks;
Each node has about 1500 partitions;
There are approximately 700 leader partitions per node;
Slow-loading partitions have about 1000 log segments;
            Reporter: shenwenbing
         Attachments: server.log

When the snapshot file does not exist, or the latest snapshot file before the 
current active period, restoring the state of producers will traverse the log 
section, it will traverse the log all batch, in the period when the individual 
broker node partition number many, that there are most of the number of logs, 
can cause a lot of IO number, IO will only load one batch at a time, such as a 
log there will always be in the tens of thousands of batch, I found that in the 
code for each batch are at least two IO operation, when a batch as the default 
16 KB,When a log segment is 1G, 65,536 batches will be generated, and then at 
least 65,536 *2= 131,072 IO operations will be generated, which will lead to a 
lot of time spent in kafka startup process. We configured 15 log recovery 
threads in the production environment, and it still took more than 2 hours to 
load a partition,can community puts forward some proposals to the situation or 
improve.For detailed logs, see the section on test-perf-18 partitions in the 
nearby logs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to