[jira] [Commented] (KAFKA-1106) HighwaterMarkCheckpoint failure puting broker into a bad state

David Lao (JIRA) Tue, 29 Oct 2013 21:48:42 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808741#comment-13808741
 ]


David Lao commented on KAFKA-1106:
----------------------------------

No there is no chance of manual intervention. However the broker node in 
question appeared to have gone through fail fast like exit and recovery a few 
hours prior but it was working fine until hitting this bug. Could a corrupted 
file have led to this? If so is failing fast the way to handle the situation?

> HighwaterMarkCheckpoint failure puting broker into a bad state
> --------------------------------------------------------------
>
>                 Key: KAFKA-1106
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1106
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: David Lao
>         Attachments: KAFKA-1106-patch, kafka.log
>
>
> I'm encountering a case where broker get stuck due to HighwaterMarkCheckpoint 
> failing to recover from reading what appear to be corrupted isr entries. Once 
> in this state, leader election can never succeed and hence stalling the 
> entire cluster. 
> Please see the detailed stack trace from the attached log.  Perhaps failing 
> fast when HighwaterMarkCheckpoint fails to read would force the broker to 
> restart and recover.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (KAFKA-1106) HighwaterMarkCheckpoint failure puting broker into a bad state

Reply via email to