Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

Dong Lin Tue, 07 Mar 2017 13:36:13 -0800

Hey Becket,

Thanks for the review.


1. I have thought about this before. I think it is fine to delete the node
after controller reads it. On controller failover, the new controller will
always send LeaderAndIsrRequest for all partitions to each broker in order
to learn about offline replicas.

2. This field is not necessary now because we currently only use this znode
for LogDirFailure event. But I envision that it may be useful in the future
e.g. for log directory fix without having to reboot broker. I have updated
the KIP to specify that we use 1 to indicate LogDirFailure event. I think
this field makes the znode more general and extensible. But I am OK to
remove this field from the znode for now and add it in the future.

Thanks,
Dong





On Tue, Mar 7, 2017 at 1:26 PM, Becket Qin <[email protected]> wrote:

> Hi Dong,
>
> Thanks for the KIP, a few more comments:
>
> 1. In the KIP wiki section "A log directory stops working on a broker
> during runtime", the controller deletes the notification node right after
> it reads the znode. It seems safer to do this at last so even though the
> controller fails the new controller will still see the notification.
> 2. In the notification znode we have Event field as an integer. Can we
> document what is the value of LogDirFailure? And also are there any other
> possible values?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Mar 7, 2017 at 11:30 AM, Dong Lin <[email protected]> wrote:
>
> > Hey Ismael,
> >
> > Thanks much for taking time to review the KIP and read through all the
> > discussion!
> >
> > Please see my reply inline.
> >
> > On Tue, Mar 7, 2017 at 9:47 AM, Ismael Juma <[email protected]> wrote:
> >
> > > Hi Dong,
> > >
> > > It took me a while, but I finally went through the whole thread. I
> have a
> > > few minor comments:
> > >
> > > 1. Regarding the metrics, can we include the full name (e.g.
> > > kafka.cluster:type=Partition,name=InSyncReplicasCount,
> > > topic={topic},partition={partition} was defined in KIP-96)?
> > >
> > > Certainly. I have updated the KIP to specify the full name.
> >
> >
> > > 2. We talk about changes in operational procedures for people switching
> > > from RAID to JBOD, but what about people who are already using JBOD?
> > Since
> > > disk failures won't necessarily cause broker failures, some adjustments
> > may
> > > be needed.
> > >
> >
> > Good point. I indeed missed one operational procedure for both the
> existing
> > RAID/JBOD user. I have updated the KIP to specify the following:
> >
> > Administrator will need to detect log directory failure by looking at
> > OfflineLogDirectoriesCount. After log directory failure is detected,
> > administrator needs to fix disks and reboot broker.
> >
> >
> > >
> > > 3. Another point regarding operational procedures, with a large enough
> > > cluster, disk failures may not be that uncommon. It may be worth
> > explaining
> > > the recommended procedure if someone needs to do a rolling bounce of a
> > > cluster with some bad disks. One option is to simply do the bounce and
> > hope
> > > that the bad disks are detected during restart, but we know that this
> is
> > > not guaranteed to happen immediately. A better option may be to remove
> > the
> > > bad log dirs from the broker config until the disk is replaced.
> > >
> >
> > I am not sure if I understand your suggestion here. I think user doesn't
> > need to differentiate between log directory failure during rolling bounce
> > and log directory failure during runtime. All they need to do is to
> detect
> > and handle log directory failure specified above. And they don't have to
> > remove the bad log directory immediately from broker config. The only
> > drawback of keeping log directory there is that a new replica may not be
> > created on the broker. But the chance of that happening is really low,
> > since the controller has to fail in a small window after user initiated
> the
> > topic creation but before it sends LeaderAndIsrRequest with
> > is_new_replica=true to the broker. In practice this shouldn't matter.
> >
> >
> > >
> > > 4. The test plan doesn't mention the number of log directories per
> > broker.
> > > It could be good to specify this. Also, we seem to create one topic
> with
> > > one partition, which means that only one log directory will be
> populated.
> > > It seems like we should have partitions in more than one log directory
> to
> > > verify that the failed log directory doesn't affect the ones that are
> > still
> > > good.
> > >
> >
> > Sure. I have updated the test description to specify that each broker
> will
> > have two log directories.
> >
> > The existing test case will actually create 2 topics to validate that
> > failed log directory won't affect the good ones. You can find them after
> > "Now validate that the previous leader can still serve replicas on the
> good
> > log directories" and "Now validate that the follower can still serve
> > replicas on the good log directories".
> >
> >
> > >
> > > 5. In the protocol definition, we have isNewReplica, but it should
> > probably
> > > be is_new_replica.
> > >
> >
> > Good point. My bad. It is fixed now.
> >
> >
> > >
> > > Thanks,
> > > Ismael
> > >
> > >
> > > On Thu, Jan 12, 2017 at 6:46 PM, Dong Lin <[email protected]> wrote:
> > >
> > > > Hi all,
> > > >
> > > > We created KIP-112: Handle disk failure for JBOD. Please find the KIP
> > > wiki
> > > > in the link https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 112%3A+Handle+disk+failure+for+JBOD.
> > > >
> > > > This KIP is related to KIP-113
> > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 113%3A+Support+replicas+movement+between+log+directories>:
> > > > Support replicas movement between log directories. They are needed in
> > > order
> > > > to support JBOD in Kafka. Please help review the KIP. You feedback is
> > > > appreciated!
> > > >
> > > > Thanks,
> > > > Dong
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

Reply via email to