Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

Dong Lin Fri, 10 Mar 2017 10:26:15 -0800

Hey Ismael,

Thanks for your comments. Please see my reply below.

On Fri, Mar 10, 2017 at 9:12 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Dong,
>
> Thanks for the updates, they look good. A couple of comments below.
>
> On Tue, Mar 7, 2017 at 7:30 PM, Dong Lin <lindon...@gmail.com> wrote:
> >
> > >
> > > 3. Another point regarding operational procedures, with a large enough
> > > cluster, disk failures may not be that uncommon. It may be worth
> > explaining
> > > the recommended procedure if someone needs to do a rolling bounce of a
> > > cluster with some bad disks. One option is to simply do the bounce and
> > hope
> > > that the bad disks are detected during restart, but we know that this
> is
> > > not guaranteed to happen immediately. A better option may be to remove
> > the
> > > bad log dirs from the broker config until the disk is replaced.
> > >
> >
> > I am not sure if I understand your suggestion here. I think user doesn't
> > need to differentiate between log directory failure during rolling bounce
> > and log directory failure during runtime. All they need to do is to
> detect
> > and handle log directory failure specified above. And they don't have to
> > remove the bad log directory immediately from broker config. The only
> > drawback of keeping log directory there is that a new replica may not be
> > created on the broker. But the chance of that happening is really low,
> > since the controller has to fail in a small window after user initiated
> the
> > topic creation but before it sends LeaderAndIsrRequest with
> > is_new_replica=true to the broker. In practice this shouldn't matter.
> >
>
>  Let me try to clarify what I mean. The document states that a broker
> assumes that a log directory is good if it can read from it when it starts.
> So, bouncing a broker with a bad disk without doing anything is a bit
> dangerous because it may be considered good again and cause issues due to
> slow performance, for example. As Colin pointed out, this is not uncommon.
> So, perhaps we should state that it is safer to remove the bad log dir from
> the broker config if a bounce is required before the disk is fixed. Does
> that make sense?
>

I get your point. But I am not sure we should recommend user to simply
remove disk from the broker config. If user simply does this without
checking the utilization of good disks, replica on the bad disk will be
re-created on the good disk and may overload the good disks, causing
cascading failure.

I agree with you and Colin that slow disk may cause problem. However,
performance degradation due to slow disk this is an existing problem that
is not detected or handled by Kafka or KIP-112. Detection and handling of
slow disk is a separate problem that needs to be addressed in a future KIP.
It is currently listed in the future work. Does this sound OK?

>
> Sure. I have updated the test description to specify that each broker will
> > have two log directories.
> >
> > The existing test case will actually create 2 topics to validate that
> > failed log directory won't affect the good ones. You can find them after
> > "Now validate that the previous leader can still serve replicas on the
> good
> > log directories" and "Now validate that the follower can still serve
> > replicas on the good log directories".
>
>
> The current plan suggests creating a second topic after the log directory
> has been marked as bad via the permission change. I am suggesting that we
> should ideally have more than one topic (or partition) before the log
> directory is marked as bad. Both cases are important and should be tested,
> in my opinion.
>

It is simpler to have multiple topic of 1 partition each instead of a topic
of multiple partitions. This is because in the latter case, it is possible
that some partition of the topic may be offline and we can not simply
consume from the topic to validate that the partitions on the good disks
can be consumed.

The main benefit of creating the second topic after log directory goes
offline is that we can make sure the second topic is created on the good
log directory. I am not sure we can simply assume that the first topic will
always be created on the first log directory in the broker config and the
second topic will be created on the second log directory in the broker
config. However, I can add this test in KIP-113 which allows user to
re-assign replica to specific log directory of a broker. Is this OK?

>
> Ismael
>

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

Reply via email to