Re: [VOTE] KIP-112 - Handle disk failure for JBOD

Joel Koshy Wed, 26 Apr 2017 10:31:14 -0700

+1

Discussed a few edits/improvements with Dong.


- Rather than a blanket (Error != None) condition for detecting offline
replicas you probably want a storage exception-specific error code.

- Definitely in favor of improvement #7 and it shouldn’t be too hard to do.
When bouncing with a log directory on a faulty disk, the condition may be
detected while loading logs and you may not have the full list of local
replicas. So a subsequent L&ISR request would recreate the replica on the
good disks (which may or may not be what the user wants).

- Another improvement worth investigating is how best to support partition
reassignments even with a bad disk. The wiki hints that this is unnecessary
because reassignments being disallowed with an offline replica is similar
to the current state of handling an offline broker. With JBOD though the
broker with a bad disk does not have to be offline anymore so it should be
possible to support reassignments even with offline replicas. I'm not
suggesting this is trivial, but would better leverage JBOD.

On Wed, Apr 5, 2017 at 5:46 PM, Becket Qin <becket....@gmail.com> wrote:

> +1
>
> Thanks for the KIP. Made a pass and had some minor change.
>
> On Mon, Apr 3, 2017 at 3:16 PM, radai <radai.rosenbl...@gmail.com> wrote:
>
> > +1, LGTM
> >
> > On Mon, Apr 3, 2017 at 9:49 AM, Dong Lin <lindon...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > It seems that there is no further concern with the KIP-112. We would
> like
> > > to start the voting process. The KIP can be found at
> > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 112%3A+Handle+disk+failure+for+JBOD
> > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 112%3A+Handle+disk+failure+for+JBOD>.*
> > >
> > > Thanks,
> > > Dong
> > >
> >
>

Re: [VOTE] KIP-112 - Handle disk failure for JBOD

Reply via email to