Thanks Dong. Comments inline. On Fri, Mar 10, 2017 at 6:25 PM, Dong Lin <lindon...@gmail.com> wrote: > > I get your point. But I am not sure we should recommend user to simply > remove disk from the broker config. If user simply does this without > checking the utilization of good disks, replica on the bad disk will be > re-created on the good disk and may overload the good disks, causing > cascading failure. >
Good point. > > I agree with you and Colin that slow disk may cause problem. However, > performance degradation due to slow disk this is an existing problem that > is not detected or handled by Kafka or KIP-112. I think an important difference is that a number of disk errors are currently fatal and won't be after KIP-112. So it introduces new scenarios (for example, bouncing a broker that is working fine although some disks have been marked bad). > Detection and handling of > slow disk is a separate problem that needs to be addressed in a future KIP. > It is currently listed in the future work. Does this sound OK? > I'm OK with it being handled in the future. In the meantime, I was just hoping that we can make it clear to users about the potential issue of a disk marked as bad becoming good again after a bounce (which can be dangerous). The main benefit of creating the second topic after log directory goes > offline is that we can make sure the second topic is created on the good > log directory. I am not sure we can simply assume that the first topic will > always be created on the first log directory in the broker config and the > second topic will be created on the second log directory in the broker > config. > However, I can add this test in KIP-113 which allows user to > re-assign replica to specific log directory of a broker. Is this OK? > OK. Please add a note to KIP-112 about this as well (so that it's clear why we only do it in KIP-113). Ismael