+1 Discussed a few edits/improvements with Dong.
- Rather than a blanket (Error != None) condition for detecting offline replicas you probably want a storage exception-specific error code. - Definitely in favor of improvement #7 and it shouldn’t be too hard to do. When bouncing with a log directory on a faulty disk, the condition may be detected while loading logs and you may not have the full list of local replicas. So a subsequent L&ISR request would recreate the replica on the good disks (which may or may not be what the user wants). - Another improvement worth investigating is how best to support partition reassignments even with a bad disk. The wiki hints that this is unnecessary because reassignments being disallowed with an offline replica is similar to the current state of handling an offline broker. With JBOD though the broker with a bad disk does not have to be offline anymore so it should be possible to support reassignments even with offline replicas. I'm not suggesting this is trivial, but would better leverage JBOD. On Wed, Apr 5, 2017 at 5:46 PM, Becket Qin <becket....@gmail.com> wrote: > +1 > > Thanks for the KIP. Made a pass and had some minor change. > > On Mon, Apr 3, 2017 at 3:16 PM, radai <radai.rosenbl...@gmail.com> wrote: > > > +1, LGTM > > > > On Mon, Apr 3, 2017 at 9:49 AM, Dong Lin <lindon...@gmail.com> wrote: > > > > > Hi all, > > > > > > It seems that there is no further concern with the KIP-112. We would > like > > > to start the voting process. The KIP can be found at > > > *https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > 112%3A+Handle+disk+failure+for+JBOD > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > 112%3A+Handle+disk+failure+for+JBOD>.* > > > > > > Thanks, > > > Dong > > > > > >