Hi Dong, Thanks for the writeup! It's very interesting.
I apologize in advance if this has been discussed somewhere else. But I am curious if you have considered the solution of running multiple brokers per node. Clearly there is a memory overhead with this solution because of the fixed cost of starting multiple JVMs. However, running multiple JVMs would help avoid scalability bottlenecks. You could probably push more RPCs per second, for example. A garbage collection in one broker would not affect the others. It would be interesting to see this considered in the "alternate designs" design, even if you end up deciding it's not the way to go. best, Colin On Thu, Jan 12, 2017, at 10:46, Dong Lin wrote: > Hi all, > > We created KIP-112: Handle disk failure for JBOD. Please find the KIP > wiki > in the link https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 112%3A+Handle+disk+failure+for+JBOD. > > This KIP is related to KIP-113 > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-113%3A+Support+replicas+movement+between+log+directories>: > Support replicas movement between log directories. They are needed in > order > to support JBOD in Kafka. Please help review the KIP. You feedback is > appreciated! > > Thanks, > Dong