[ https://issues.apache.org/jira/browse/KAFKA-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808741#comment-13808741 ]
David Lao commented on KAFKA-1106: ---------------------------------- No there is no chance of manual intervention. However the broker node in question appeared to have gone through fail fast like exit and recovery a few hours prior but it was working fine until hitting this bug. Could a corrupted file have led to this? If so is failing fast the way to handle the situation? > HighwaterMarkCheckpoint failure puting broker into a bad state > -------------------------------------------------------------- > > Key: KAFKA-1106 > URL: https://issues.apache.org/jira/browse/KAFKA-1106 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.8 > Reporter: David Lao > Attachments: KAFKA-1106-patch, kafka.log > > > I'm encountering a case where broker get stuck due to HighwaterMarkCheckpoint > failing to recover from reading what appear to be corrupted isr entries. Once > in this state, leader election can never succeed and hence stalling the > entire cluster. > Please see the detailed stack trace from the attached log. Perhaps failing > fast when HighwaterMarkCheckpoint fails to read would force the broker to > restart and recover. -- This message was sent by Atlassian JIRA (v6.1#6144)