I will change the wording to reflect this. But yes, a broker follower should only enter the ISR once it is fully caught up.
Caught up means that the follower has read from the log end offset from the broker. I'm using the log end offset from before the actual read operation to avoid these off by one errors. In any case, I plan to run this locally with a small cluster and see how it performs. Aditya ________________________________________ From: Joe Stein [joe.st...@stealth.ly] Sent: Thursday, March 12, 2015 1:54 PM To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP 16 - Replica lag tuning Hi Aditya, thanks for the writeup. Lets say a broker follower goes down. And it is down for an hour or two.... When the broker follower comes back up it will start sending fetch requests (lets say every 2ms which would be under a configured lets say 100ms (whatever)). Then right away the brokers gets added back to the ISR? Maybe it is just the wording or how I am reading it... I think/thought that once the replica is caught up THEN the setting goes into action and as long as (every 100ms ... whatever) the broker leader is seeing the broker follower as "caught up" then it is in the ISR. Also, what is the definition of "caught up" now without the number of messages? If it is === i worry about that not happening in some networks where it is always off by one or something maybe? ~ Joe Stein - - - - - - - - - - - - - - - - - http://www.stealth.ly - - - - - - - - - - - - - - - - - On Thu, Mar 12, 2015 at 4:36 PM, Aditya Auradkar < aaurad...@linkedin.com.invalid> wrote: > I wrote a KIP for this after some discussion on KAFKA-1546. > > https://cwiki.apache.org/confluence/display/KAFKA/KIP+16+:+Automated+Replica+Lag+Tuning > > The RB is here: https://reviews.apache.org/r/31967/ > > Thanks, > Aditya >