Imran Patel created KAFKA-3039:
----------------------------------

             Summary: Temporary loss of leader resulted in log being completely 
truncated
                 Key: KAFKA-3039
                 URL: https://issues.apache.org/jira/browse/KAFKA-3039
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.9.0.0
         Environment: Debian 3.2.54-2 x86_64 GNU/Linux
            Reporter: Imran Patel
            Priority: Critical


We had an event recently where the temporarily loss of a leader for a partition 
(during a manual restart), resulted in the leader coming back with no high 
watermark state and truncating its log to zero. Logs (attached below) indicate 
that it did have the data but not the commit state. How is this possible?

Leader (broker 3)
[2015-12-18 21:19:44,666] INFO Completed load of log messages-14 with log end 
offset 14175963374 (kafka.log.Log)
[2015-12-18 21:19:45,170] INFO Partition [messages,14] on broker 3: No 
checkpointed highwatermark is found for partition [messages,14] 
(kafka.cluster.Partition)
[2015-12-18 21:19:45,238] INFO Truncating log messages-14 to offset 0. 
(kafka.log.Log)
[2015-12-18 21:20:34,066] INFO Partition [messages,14] on broker 3: Expanding 
ISR for partition [messages,14] from 3 to 3,10 (kafka.cluster.Partition)

Replica (broker 10)
[2015-12-18 21:19:19,525] INFO Partition [messages,14] on broker 10: Shrinking 
ISR for partition [messages,14] from 3,10,4 to 10,4 (kafka.cluster.Partition)
[2015-12-18 21:20:34,049] ERROR [ReplicaFetcherThread-0-3], Current offset 
14175984203 for partition [messages,14] out of range; reset offset to 35977 
(kafka.server.ReplicaFetcherThread)
[2015-12-18 21:20:34,033] WARN [ReplicaFetcherThread-0-3], Replica 10 for 
partition [messages,14] reset its fetch offset from 14175984203 to current 
leader 3's latest offset 35977 (kafka.server.ReplicaFetcherThread)

Some relevant config parameters:
        offsets.topic.replication.factor = 3
        offsets.commit.required.acks = -1
        replica.high.watermark.checkpoint.interval.ms = 5000
        unclean.leader.election.enable = false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to