[jira] [Commented] (KAFKA-3039) Temporary loss of leader resulted in log being completely truncated

Ivan Babrou (JIRA) Wed, 30 Aug 2017 04:08:10 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147070#comment-16147070
 ]


Ivan Babrou commented on KAFKA-3039:
------------------------------------

We also experienced this and out of 28 upgraded nodes in one rack 4 nodes 
decided to nuke 1 partition (different partitions on each node):

{noformat}
2017-08-30T10:17:29.509 node-93 WARN [ReplicaFetcherThread-0-10042]: Based on 
follower's leader epoch, leader replied with an unknown offset in requests-48. 
High watermark 0 will be used for truncation. 
(kafka.server.ReplicaFetcherThread)
2017-08-30T10:17:29.510 node-93 INFO Truncating log requests-48 to offset 0. 
(kafka.log.Log)
--
2017-08-30T10:17:29.536 node-93 WARN [ReplicaFetcherThread-0-10082]: Based on 
follower's leader epoch, leader replied with an unknown offset in requests-80. 
High watermark 0 will be used for truncation. 
(kafka.server.ReplicaFetcherThread)
2017-08-30T10:17:29.536 node-93 INFO Truncating log requests-80 to offset 0. 
(kafka.log.Log)
--
2017-08-30T10:26:32.203 node-87 WARN [ReplicaFetcherThread-2-10056]: Based on 
follower's leader epoch, leader replied with an unknown offset in requests-82. 
High watermark 0 will be used for truncation. 
(kafka.server.ReplicaFetcherThread)
2017-08-30T10:26:32.204 node-87 INFO Truncating log requests-82 to offset 0. 
(kafka.log.Log)
--
2017-08-30T10:27:31.755 node-89 WARN [ReplicaFetcherThread-3-10055]: Based on 
follower's leader epoch, leader replied with an unknown offset in requests-79. 
High watermark 0 will be used for truncation. 
(kafka.server.ReplicaFetcherThread)
2017-08-30T10:27:31.756 node-89 INFO Truncating log requests-79 to offset 0. 
(kafka.log.Log)
{noformat}

This was a rolling upgrade from 0.10.2.0 to 0.11.0.0. Nodes that truncated logs 
were not leaders before the upgrade (not even preferred).

> Temporary loss of leader resulted in log being completely truncated
> -------------------------------------------------------------------
>
>                 Key: KAFKA-3039
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3039
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.9.0.0
>         Environment: Debian 3.2.54-2 x86_64 GNU/Linux
>            Reporter: Imran Patel
>            Priority: Critical
>              Labels: reliability
>
> We had an event recently where the temporarily loss of a leader for a 
> partition (during a manual restart), resulted in the leader coming back with 
> no high watermark state and truncating its log to zero. Logs (attached below) 
> indicate that it did have the data but not the commit state. How is this 
> possible?
> Leader (broker 3)
> [2015-12-18 21:19:44,666] INFO Completed load of log messages-14 with log end 
> offset 14175963374 (kafka.log.Log)
> [2015-12-18 21:19:45,170] INFO Partition [messages,14] on broker 3: No 
> checkpointed highwatermark is found for partition [messages,14] 
> (kafka.cluster.Partition)
> [2015-12-18 21:19:45,238] INFO Truncating log messages-14 to offset 0. 
> (kafka.log.Log)
> [2015-12-18 21:20:34,066] INFO Partition [messages,14] on broker 3: Expanding 
> ISR for partition [messages,14] from 3 to 3,10 (kafka.cluster.Partition)
> Replica (broker 10)
> [2015-12-18 21:19:19,525] INFO Partition [messages,14] on broker 10: 
> Shrinking ISR for partition [messages,14] from 3,10,4 to 10,4 
> (kafka.cluster.Partition)
> [2015-12-18 21:20:34,049] ERROR [ReplicaFetcherThread-0-3], Current offset 
> 14175984203 for partition [messages,14] out of range; reset offset to 35977 
> (kafka.server.ReplicaFetcherThread)
> [2015-12-18 21:20:34,033] WARN [ReplicaFetcherThread-0-3], Replica 10 for 
> partition [messages,14] reset its fetch offset from 14175984203 to current 
> leader 3's latest offset 35977 (kafka.server.ReplicaFetcherThread)
> Some relevant config parameters:
>         offsets.topic.replication.factor = 3
>         offsets.commit.required.acks = -1
>         replica.high.watermark.checkpoint.interval.ms = 5000
>         unclean.leader.election.enable = false



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-3039) Temporary loss of leader resulted in log being completely truncated

Reply via email to