Re: [GitHub] kafka pull request: Parallel log-recovery of un-flushed segments o...

Achanta Vamsi Subhash Thu, 10 Mar 2016 10:40:19 -0800

Hi,
I would like to make this into 0.0.10.0 so can someone look into this and
review?


On Wed, Mar 9, 2016 at 10:29 PM, Achanta Vamsi Subhash <
[email protected]> wrote:

> Hi all,
>
> https://github.com/apache/kafka/pull/1035
> This pull request will make the log-segment load parallel with two
> configurable properties "log.recovery.threads" and "
> log.recovery.max.interval.ms".
>
> On startup, currently the log segments within a logDir are loaded
> sequentially when there is a un-clean shutdown. This will take a lot of
> time for the segments to be loaded as the logSegment.recover(..) is called
> for every segment and for brokers which have many partitions, the time
> taken will be very high (we have noticed ~40mins for 2k partitions).
>
> Logic:
> 1. Have a threadpool defined of fixed length (log.recovery.threads)
> 2. Submit the logSegment recovery as a job to the threadpool and add the
> future returned to a job list
> 3. Wait till all the jobs are done within req. time (
> log.recovery.max.interval.ms - default set to Long.Max).
> 4. If they are done and the futures are all null (meaning that the jobs
> are successfully completed), it is considered done.
> 5. If any of the recovery jobs failed, then it is logged and
> LogRecoveryFailedException is thrown
> 6. If the timeout is reached, LogRecoveryFailedException is thrown.
> The logic is backward compatible with the current sequential
> implementation as the default thread count is set to 1.
>
> JIRA link is here:
> https://issues.apache.org/jira/browse/KAFKA-3359
>
> Please review and give me suggestions. Will make them and contribute.
> Thanks.
>
>
> On Wed, Mar 9, 2016 at 7:57 PM, vamsi-subhash <[email protected]> wrote:
>
>> GitHub user vamsi-subhash opened a pull request:
>>
>>     https://github.com/apache/kafka/pull/1035
>>
>>     Parallel log-recovery of un-flushed segments on startup
>>
>>     Did not find any tests for the method. Will be adding them
>>
>> You can merge this pull request into a Git repository by running:
>>
>>     $ git pull https://github.com/vamsi-subhash/kafka trunk
>>
>> Alternatively you can review and apply these changes as the patch at:
>>
>>     https://github.com/apache/kafka/pull/1035.patch
>>
>> To close this pull request, make a commit to your master/trunk branch
>> with (at least) the following in the commit message:
>>
>>     This closes #1035
>>
>> ----
>> commit ecab815203a2b6396703660d5a2f9d9bb00efcf3
>> Author: Vamsi Subhash Achanta <[email protected]>
>> Date:   2016-03-09T14:24:37Z
>>
>>     Made log-recovery parallel
>>
>> ----
>>
>>
>> ---
>> If your project is set up for it, you can reply to this email and have
>> your
>> reply appear on GitHub as well. If your project does not have this feature
>> enabled and wishes so, or if the feature is enabled but not working,
>> please
>> contact infrastructure at [email protected] or file a JIRA ticket
>> with INFRA.
>> ---
>>
>
>
>
> --
> Regards
> Vamsi Subhash
>



-- 
Regards
Vamsi Subhash

Re: [GitHub] kafka pull request: Parallel log-recovery of un-flushed segments o...

Reply via email to