[
https://issues.apache.org/jira/browse/KAFKA-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180328#comment-14180328
]
Joel Koshy commented on KAFKA-1647:
-----------------------------------
[~becket_qin] here are some steps to reproduce locally. There are probably
simpler steps, but I ran into it while debugging something else, so here you go:
* Set up three brokers. Sample config:
https://gist.github.com/jjkoshy/1ec36e5cef41ac4bd8fb (You will need to edit the
logs directory and port)
* Create 50 topics; each with 4 partitions; replication factor 2 {code}for i
in {1..50}; do ./bin/kafka-topics.sh --create --topic test$i --zookeeper
localhost:2181 --partitions 4 --replication-factor 2; done{code}
* Run producer performance: {code}./bin/kafka-producer-perf-test.sh --threads 4
--broker-list localhost:9092,localhost:9093 --vary-message-size --messages
922337203685477580 --topics
test1,test2,test3,test4,test5,test6,test7,test8,test9,test10,test11,test12,test13,test14,test15,test16,test17,test18,test19,test20,test21,test22,test23,test24,test25,test26,test27,test28,test29,test30,test31,test32,test33,test34,test35,test36,test37,test38,test39,test40,test41,test42,test43,test44,test45,test46,test47,test48,test49,test50
--message-size 500{code}
* Parallel hard kill of all brokers: {{pkill -9 -f Kafka}}
* Kill producer performance
* Restart brokers
* You should see "WARN No checkpointed highwatermark is found for partition..."
> Replication offset checkpoints (high water marks) can be lost on hard kills
> and restarts
> ----------------------------------------------------------------------------------------
>
> Key: KAFKA-1647
> URL: https://issues.apache.org/jira/browse/KAFKA-1647
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.8.2
> Reporter: Joel Koshy
> Assignee: Jiangjie Qin
> Priority: Critical
> Labels: newbie++
> Fix For: 0.8.2
>
> Attachments: KAFKA-1647.patch, KAFKA-1647_2014-10-13_16:38:39.patch,
> KAFKA-1647_2014-10-18_00:26:51.patch, KAFKA-1647_2014-10-21_23:08:43.patch
>
>
> We ran into this scenario recently in a production environment. This can
> happen when enough brokers in a cluster are taken down. i.e., a rolling
> bounce done properly should not cause this issue. It can occur if all
> replicas for any partition are taken down.
> Here is a sample scenario:
> * Cluster of three brokers: b0, b1, b2
> * Two partitions (of some topic) with replication factor two: p0, p1
> * Initial state:
> p0: leader = b0, ISR = {b0, b1}
> p1: leader = b1, ISR = {b0, b1}
> * Do a parallel hard-kill of all brokers
> * Bring up b2, so it is the new controller
> * b2 initializes its controller context and populates its leader/ISR cache
> (i.e., controllerContext.partitionLeadershipInfo) from zookeeper. The last
> known leaders are b0 (for p0) and b1 (for p2)
> * Bring up b1
> * The controller's onBrokerStartup procedure initiates a replica state change
> for all replicas on b1 to become online. As part of this replica state change
> it gets the last known leader and ISR and sends a LeaderAndIsrRequest to b1
> (for p1 and p2). This LeaderAndIsr request contains: {{p0: leader=b0; p1:
> leader=b1;} leaders=b1}. b0 is indicated as the leader of p0 but it is not
> included in the leaders field because b0 is down.
> * On receiving the LeaderAndIsrRequest, b1's replica manager will
> successfully make itself (b1) the leader for p1 (and create the local replica
> object corresponding to p1). It will however abort the become follower
> transition for p0 because the designated leader b0 is offline. So it will not
> create the local replica object for p0.
> * It will then start the high water mark checkpoint thread. Since only p1 has
> a local replica object, only p1's high water mark will be checkpointed to
> disk. p0's previously written checkpoint if any will be lost.
> So in summary it seems we should always create the local replica object even
> if the online transition does not happen.
> Possible symptoms of the above bug could be one or more of the following (we
> saw 2 and 3):
> # Data loss; yes on a hard-kill data loss is expected, but this can actually
> cause loss of nearly all data if the broker becomes follower, truncates, and
> soon after happens to become leader.
> # High IO on brokers that lose their high water mark then subsequently (on a
> successful become follower transition) truncate their log to zero and start
> catching up from the beginning.
> # If the offsets topic is affected, then offsets can get reset. This is
> because during an offset load we don't read past the high water mark. So if a
> water mark is missing then we don't load anything (even if the offsets are
> there in the log).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)