Vamsi Subhash Achanta created KAFKA-3359:
--------------------------------------------

             Summary: Parallel log-recovery of un-flushed segments on startup
                 Key: KAFKA-3359
                 URL: https://issues.apache.org/jira/browse/KAFKA-3359
             Project: Kafka
          Issue Type: Bug
          Components: log
    Affects Versions: 0.9.0.1, 0.8.2.2
            Reporter: Vamsi Subhash Achanta
            Assignee: Jay Kreps
            Priority: Minor


On startup, currently the log segments within a logDir are loaded sequentially 
when there is a un-clean shutdown. This will take a lot of time for the 
segments to be loaded as the logSegment.recover(..) is called for every segment 
and for brokers which have many partitions, the time taken will be very high 
(we have noticed ~40mins for 2k partitions).

https://github.com/apache/kafka/pull/1035

This pull request will make the log-segment load parallel with two configurable 
properties "log.recovery.threads" and "log.recovery.max.interval.ms".

Logic:
1. Have a threadpool defined of fixed length (log.recovery.threads)
2. Submit the logSegment recovery as a job to the threadpool and add the future 
returned to a job list
3. Wait till all the jobs are done within req. time 
(log.recovery.max.interval.ms - default set to Long.Max).
4. If they are done and the futures are all null (meaning that the jobs are 
successfully completed), it is considered done.
5. If any of the recovery jobs failed, then it is logged and 
LogRecoveryFailedException is thrown
6. If the timeout is reached, LogRecoveryFailedException is thrown.

The logic is backward compatible with the current sequential implementation as 
the default thread count is set to 1.

PS: I am new to Scala and the code might look Java-ish but I will be happy to 
modify the code review changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to