Re: [VOTE] KIP-112 - Handle disk failure for JBOD

Dong Lin Thu, 27 Apr 2017 16:31:30 -0700

Thanks to everyone who voted and provided feedback!

This KIP is now adopted with 3 binding +1s (Jun, Joel, Becket) and 1
non-binding +1s (Radai)


Dong

On Thu, Apr 27, 2017 at 4:12 PM, Dong Lin <lindon...@gmail.com> wrote:

> Thanks for the vote Jun!
>
> I think that statement is probably OK because it assumes that broker has
> bad log directories. If all log directories are good, the replica should be
> created in one of the good log directories. It is clarified in the wiki
> that "Even if isNewReplica=false and replica is not found on any log
> directory, broker will still create replica on a good log directory if
> there is no bad log directory.".
>
>
> On Thu, Apr 27, 2017 at 4:07 PM, Jun Rao <j...@confluent.io> wrote:
>
>> Hi, Dong,
>>
>> Thanks for the proposal. +1. Just one minor comment.
>>
>> in "3. Broker bootstraps with bad log directories", when a broker receives
>> a LeaderAndIsrRequest with isNewReplica=False but not found on any good
>> log
>> directory, if all log directories are good, it seems that we should create
>> the replica in one of the good log directories? This can happen if a
>> replica is manually deleted from the log directory.
>>
>> Jun
>>
>> On Wed, Apr 26, 2017 at 11:27 AM, Dong Lin <lindon...@gmail.com> wrote:
>>
>> > Thanks for the vote!
>> >
>> > Discussed with Joel offline. I have updated the KIP to specify that
>> > controller will consider a replica to be offline if
>> KafkaStorageException
>> > is specified for the replica in the LeaderAndIsrResponse. The other two
>> > improvements may be done in the future KIP.
>> >
>> >
>> >
>> > On Wed, Apr 26, 2017 at 10:30 AM, Joel Koshy <jjkosh...@gmail.com>
>> wrote:
>> >
>> > > +1
>> > >
>> > > Discussed a few edits/improvements with Dong.
>> > >
>> > > - Rather than a blanket (Error != None) condition for detecting
>> offline
>> > > replicas you probably want a storage exception-specific error code.
>> > >
>> > > - Definitely in favor of improvement #7 and it shouldn’t be too hard
>> to
>> > do.
>> > > When bouncing with a log directory on a faulty disk, the condition
>> may be
>> > > detected while loading logs and you may not have the full list of
>> local
>> > > replicas. So a subsequent L&ISR request would recreate the replica on
>> the
>> > > good disks (which may or may not be what the user wants).
>> > >
>> > > - Another improvement worth investigating is how best to support
>> > partition
>> > > reassignments even with a bad disk. The wiki hints that this is
>> > unnecessary
>> > > because reassignments being disallowed with an offline replica is
>> similar
>> > > to the current state of handling an offline broker. With JBOD though
>> the
>> > > broker with a bad disk does not have to be offline anymore so it
>> should
>> > be
>> > > possible to support reassignments even with offline replicas. I'm not
>> > > suggesting this is trivial, but would better leverage JBOD.
>> > >
>> > > On Wed, Apr 5, 2017 at 5:46 PM, Becket Qin <becket....@gmail.com>
>> wrote:
>> > >
>> > > > +1
>> > > >
>> > > > Thanks for the KIP. Made a pass and had some minor change.
>> > > >
>> > > > On Mon, Apr 3, 2017 at 3:16 PM, radai <radai.rosenbl...@gmail.com>
>> > > wrote:
>> > > >
>> > > > > +1, LGTM
>> > > > >
>> > > > > On Mon, Apr 3, 2017 at 9:49 AM, Dong Lin <lindon...@gmail.com>
>> > wrote:
>> > > > >
>> > > > > > Hi all,
>> > > > > >
>> > > > > > It seems that there is no further concern with the KIP-112. We
>> > would
>> > > > like
>> > > > > > to start the voting process. The KIP can be found at
>> > > > > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > > > 112%3A+Handle+disk+failure+for+JBOD
>> > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > > > 112%3A+Handle+disk+failure+for+JBOD>.*
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Dong
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [VOTE] KIP-112 - Handle disk failure for JBOD

Reply via email to