Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

Jun Rao Mon, 06 Feb 2017 18:22:09 -0800

Hi, Dong,

Thanks for the proposal. A few quick questions/comments.


1. Do you know why your stress test loses 15% of the throughput with the
one-broker-per-disk setup?

2. In the KIP, it wasn't super clear to me what
/broker/topics/[topic]/partitions/[partitionId]/controller_managed_state
represents
and when it's updated. It seems that this path should be updated every time
the disk associated with one of the replica goes bad or is fixed. However,
the wiki only mentions updating this path when a new topic is created. It
would be useful to provide a high level description of what this path
stores, when it's updated and by who.

3. The proposal uses ZK to propagate disk failure/recovery from the broker
to the controller. Not sure if this is the best approach in the long term.
It may be better for the broker to send RPC requests directly to the
controller?

Jun


On Wed, Jan 25, 2017 at 1:50 PM, Dong Lin <lindon...@gmail.com> wrote:

> Hey Colin,
>
> Good point! Yeah we have actually considered and tested this solution,
> which we call one-broker-per-disk. It would work and should require no
> major change in Kafka as compared to this JBOD KIP. So it would be a good
> short term solution.
>
> But it has a few drawbacks which makes it less desirable in the long term.
> Assume we have 10 disks on a machine. Here are the problems:
>
> 1) Our stress test result shows that one-broker-per-disk has 15% lower
> throughput
>
> 2) Controller would need to send 10X as many LeaderAndIsrRequest,
> MetadataUpdateRequest and StopReplicaRequest. This increases the burden on
> controller which can be the performance bottleneck.
>
> 3) Less efficient use of physical resource on the machine. The number of
> socket on each machine will increase by 10X. The number of connection
> between any two machine will increase by 100X.
>
> 4) Less efficient way to management memory and quota.
>
> 5) Rebalance between disks/brokers on the same machine will less efficient
> and less flexible. Broker has to read data from another broker on the same
> machine via socket. It is also harder to do automatic load balance between
> disks on the same machine in the future.
>
> I will put this and the explanation in the rejected alternative section. I
> have a few questions:
>
> - Can you explain why this solution can help avoid scalability bottleneck?
> I actually think it will exacerbate the scalability problem due the 2)
> above.
> - Why can we push more RPC with this solution?
> - It is true that a garbage collection in one broker would not affect
> others. But that is after every broker only uses 1/10 of the memory. Can we
> be sure that this will actually help performance?
>
> Thanks,
> Dong
>
> On Wed, Jan 25, 2017 at 11:34 AM, Colin McCabe <cmcc...@apache.org> wrote:
>
> > Hi Dong,
> >
> > Thanks for the writeup!  It's very interesting.
> >
> > I apologize in advance if this has been discussed somewhere else.  But I
> > am curious if you have considered the solution of running multiple
> > brokers per node.  Clearly there is a memory overhead with this solution
> > because of the fixed cost of starting multiple JVMs.  However, running
> > multiple JVMs would help avoid scalability bottlenecks.  You could
> > probably push more RPCs per second, for example.  A garbage collection
> > in one broker would not affect the others.  It would be interesting to
> > see this considered in the "alternate designs" design, even if you end
> > up deciding it's not the way to go.
> >
> > best,
> > Colin
> >
> >
> > On Thu, Jan 12, 2017, at 10:46, Dong Lin wrote:
> > > Hi all,
> > >
> > > We created KIP-112: Handle disk failure for JBOD. Please find the KIP
> > > wiki
> > > in the link https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 112%3A+Handle+disk+failure+for+JBOD.
> > >
> > > This KIP is related to KIP-113
> > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 113%3A+Support+replicas+movement+between+log+directories>:
> > > Support replicas movement between log directories. They are needed in
> > > order
> > > to support JBOD in Kafka. Please help review the KIP. You feedback is
> > > appreciated!
> > >
> > > Thanks,
> > > Dong
> >
>

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

Reply via email to