Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

Ismael Juma Tue, 14 Mar 2017 10:32:38 -0700

Thanks Dong. Comments inline.

On Fri, Mar 10, 2017 at 6:25 PM, Dong Lin <lindon...@gmail.com> wrote:
>
> I get your point. But I am not sure we should recommend user to simply
> remove disk from the broker config. If user simply does this without
> checking the utilization of good disks, replica on the bad disk will be
> re-created on the good disk and may overload the good disks, causing
> cascading failure.
>


Good point.


>
> I agree with you and Colin that slow disk may cause problem. However,
> performance degradation due to slow disk this is an existing problem that
> is not detected or handled by Kafka or KIP-112.


I think an important difference is that a number of disk errors are
currently fatal and won't be after KIP-112. So it introduces new scenarios
(for example, bouncing a broker that is working fine although some disks
have been marked bad).


> Detection and handling of
> slow disk is a separate problem that needs to be addressed in a future KIP.
> It is currently listed in the future work. Does this sound OK?
>

I'm OK with it being handled in the future. In the meantime, I was just
hoping that we can make it clear to users about the potential issue of a
disk marked as bad becoming good again after a bounce (which can be
dangerous).

The main benefit of creating the second topic after log directory goes
> offline is that we can make sure the second topic is created on the good
> log directory. I am not sure we can simply assume that the first topic will
> always be created on the first log directory in the broker config and the
> second topic will be created on the second log directory in the broker
> config.



> However, I can add this test in KIP-113 which allows user to
> re-assign replica to specific log directory of a broker. Is this OK?
>

OK. Please add a note to KIP-112 about this as well (so that it's clear why
we only do it in KIP-113).

Ismael

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

Reply via email to