[jira] [Commented] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts

Joel Koshy (JIRA) Wed, 22 Oct 2014 11:58:10 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180328#comment-14180328
 ]


Joel Koshy commented on KAFKA-1647:
-----------------------------------

[~becket_qin] here are some steps to reproduce locally. There are probably 
simpler steps, but I ran into it while debugging something else, so here you go:

* Set up three brokers. Sample config: 
https://gist.github.com/jjkoshy/1ec36e5cef41ac4bd8fb (You will need to edit the 
logs directory and port)
* Create 50 topics;  each with 4 partitions; replication factor 2 {code}for i 
in {1..50}; do ./bin/kafka-topics.sh --create --topic test$i --zookeeper 
localhost:2181 --partitions 4 --replication-factor 2; done{code}
* Run producer performance: {code}./bin/kafka-producer-perf-test.sh --threads 4 
--broker-list localhost:9092,localhost:9093 --vary-message-size --messages 
922337203685477580 --topics 
test1,test2,test3,test4,test5,test6,test7,test8,test9,test10,test11,test12,test13,test14,test15,test16,test17,test18,test19,test20,test21,test22,test23,test24,test25,test26,test27,test28,test29,test30,test31,test32,test33,test34,test35,test36,test37,test38,test39,test40,test41,test42,test43,test44,test45,test46,test47,test48,test49,test50
 --message-size 500{code}
* Parallel hard kill of all brokers: {{pkill -9 -f Kafka}}
* Kill producer performance
* Restart brokers
* You should see "WARN No checkpointed highwatermark is found for partition..."


> Replication offset checkpoints (high water marks) can be lost on hard kills 
> and restarts
> ----------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1647
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1647
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>            Reporter: Joel Koshy
>            Assignee: Jiangjie Qin
>            Priority: Critical
>              Labels: newbie++
>             Fix For: 0.8.2
>
>         Attachments: KAFKA-1647.patch, KAFKA-1647_2014-10-13_16:38:39.patch, 
> KAFKA-1647_2014-10-18_00:26:51.patch, KAFKA-1647_2014-10-21_23:08:43.patch
>
>
> We ran into this scenario recently in a production environment. This can 
> happen when enough brokers in a cluster are taken down. i.e., a rolling 
> bounce done properly should not cause this issue. It can occur if all 
> replicas for any partition are taken down.
> Here is a sample scenario:
> * Cluster of three brokers: b0, b1, b2
> * Two partitions (of some topic) with replication factor two: p0, p1
> * Initial state:
> p0: leader = b0, ISR = {b0, b1}
> p1: leader = b1, ISR = {b0, b1}
> * Do a parallel hard-kill of all brokers
> * Bring up b2, so it is the new controller
> * b2 initializes its controller context and populates its leader/ISR cache 
> (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last 
> known leaders are b0 (for p0) and b1 (for p2)
> * Bring up b1
> * The controller's onBrokerStartup procedure initiates a replica state change 
> for all replicas on b1 to become online. As part of this replica state change 
> it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1 
> (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1: 
> leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not 
> included in the leaders field because b0 is down.
> * On receiving the LeaderAndIsrRequest, b1's replica manager will 
> successfully make itself (b1) the leader for p1 (and create the local replica 
> object corresponding to p1). It will however abort the become follower 
> transition for p0 because the designated leader b0 is offline. So it will not 
> create the local replica object for p0.
> * It will then start the high water mark checkpoint thread. Since only p1 has 
> a local replica object, only p1's high water mark will be checkpointed to 
> disk. p0's previously written checkpoint  if any will be lost.
> So in summary it seems we should always create the local replica object even 
> if the online transition does not happen.
> Possible symptoms of the above bug could be one or more of the following (we 
> saw 2 and 3):
> # Data loss; yes on a hard-kill data loss is expected, but this can actually 
> cause loss of nearly all data if the broker becomes follower, truncates, and 
> soon after happens to become leader.
> # High IO on brokers that lose their high water mark then subsequently (on a 
> successful become follower transition) truncate their log to zero and start 
> catching up from the beginning.
> # If the offsets topic is affected, then offsets can get reset. This is 
> because during an offset load we don't read past the high water mark. So if a 
> water mark is missing then we don't load anything (even if the offsets are 
> there in the log).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1647) Replication offset checkpoints (high water marks) can be lost on hard kills and restarts

Reply via email to