Thanks to everyone who voted and provided feedback! This KIP is now adopted with 3 binding +1s (Jun, Joel, Becket) and 1 non-binding +1s (Radai)
Dong On Thu, Apr 27, 2017 at 4:12 PM, Dong Lin <lindon...@gmail.com> wrote: > Thanks for the vote Jun! > > I think that statement is probably OK because it assumes that broker has > bad log directories. If all log directories are good, the replica should be > created in one of the good log directories. It is clarified in the wiki > that "Even if isNewReplica=false and replica is not found on any log > directory, broker will still create replica on a good log directory if > there is no bad log directory.". > > > On Thu, Apr 27, 2017 at 4:07 PM, Jun Rao <j...@confluent.io> wrote: > >> Hi, Dong, >> >> Thanks for the proposal. +1. Just one minor comment. >> >> in "3. Broker bootstraps with bad log directories", when a broker receives >> a LeaderAndIsrRequest with isNewReplica=False but not found on any good >> log >> directory, if all log directories are good, it seems that we should create >> the replica in one of the good log directories? This can happen if a >> replica is manually deleted from the log directory. >> >> Jun >> >> On Wed, Apr 26, 2017 at 11:27 AM, Dong Lin <lindon...@gmail.com> wrote: >> >> > Thanks for the vote! >> > >> > Discussed with Joel offline. I have updated the KIP to specify that >> > controller will consider a replica to be offline if >> KafkaStorageException >> > is specified for the replica in the LeaderAndIsrResponse. The other two >> > improvements may be done in the future KIP. >> > >> > >> > >> > On Wed, Apr 26, 2017 at 10:30 AM, Joel Koshy <jjkosh...@gmail.com> >> wrote: >> > >> > > +1 >> > > >> > > Discussed a few edits/improvements with Dong. >> > > >> > > - Rather than a blanket (Error != None) condition for detecting >> offline >> > > replicas you probably want a storage exception-specific error code. >> > > >> > > - Definitely in favor of improvement #7 and it shouldn’t be too hard >> to >> > do. >> > > When bouncing with a log directory on a faulty disk, the condition >> may be >> > > detected while loading logs and you may not have the full list of >> local >> > > replicas. So a subsequent L&ISR request would recreate the replica on >> the >> > > good disks (which may or may not be what the user wants). >> > > >> > > - Another improvement worth investigating is how best to support >> > partition >> > > reassignments even with a bad disk. The wiki hints that this is >> > unnecessary >> > > because reassignments being disallowed with an offline replica is >> similar >> > > to the current state of handling an offline broker. With JBOD though >> the >> > > broker with a bad disk does not have to be offline anymore so it >> should >> > be >> > > possible to support reassignments even with offline replicas. I'm not >> > > suggesting this is trivial, but would better leverage JBOD. >> > > >> > > On Wed, Apr 5, 2017 at 5:46 PM, Becket Qin <becket....@gmail.com> >> wrote: >> > > >> > > > +1 >> > > > >> > > > Thanks for the KIP. Made a pass and had some minor change. >> > > > >> > > > On Mon, Apr 3, 2017 at 3:16 PM, radai <radai.rosenbl...@gmail.com> >> > > wrote: >> > > > >> > > > > +1, LGTM >> > > > > >> > > > > On Mon, Apr 3, 2017 at 9:49 AM, Dong Lin <lindon...@gmail.com> >> > wrote: >> > > > > >> > > > > > Hi all, >> > > > > > >> > > > > > It seems that there is no further concern with the KIP-112. We >> > would >> > > > like >> > > > > > to start the voting process. The KIP can be found at >> > > > > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > > > > > 112%3A+Handle+disk+failure+for+JBOD >> > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > > > > > 112%3A+Handle+disk+failure+for+JBOD>.* >> > > > > > >> > > > > > Thanks, >> > > > > > Dong >> > > > > > >> > > > > >> > > > >> > > >> > >> > >