[
https://issues.apache.org/jira/browse/KAFKA-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753257#comment-13753257
]
Swapnil Ghike commented on KAFKA-1032:
--------------------------------------
The problem is that the leader that GC-ed did not receive become-follower
request from controller soon enough, so it kept acting like a leader post GC
for some time and appended new messages. These messages were lost when the
affected broker became a follower.
The other approach to fix this could involve changing
OfflinePartitionLeaderSelector to send LeaderAndIsrRequest to dead brokers,
this will ensure that the old leader (if still alive) will stop acting like a
leader much sooner.
> Messages sent to the old leader will be lost on broker GC resulted failure
> --------------------------------------------------------------------------
>
> Key: KAFKA-1032
> URL: https://issues.apache.org/jira/browse/KAFKA-1032
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.8
> Reporter: Guozhang Wang
> Assignee: Guozhang Wang
>
> As pointed out by Swapnil, today when a broker in on long GC, it will marked
> by the controller as failed and trigger the onBrokerFailure function to
> migrate leadership to other brokers. However, since the Controller does not
> notify the broker with stopReplica request even after a new leader has been
> elected for its partitions. The new leader will hence stop fetching from the
> old leader while the old leader is not aware that he is no longer the leader.
> And since the old leader is not really dead producers will not refresh their
> metadata immediately and will continue sending messages to the old leader.
> The old leader will only know it is no longer the leader when it gets
> notified by controller in the onBrokerStartup function, and message sent
> starting from the time the new leader is elected to the timestamp the old
> leader realize it is no longer the leader will be lost.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira