[ https://issues.apache.org/jira/browse/KAFKA-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005715#comment-15005715 ]
ASF GitHub Bot commented on KAFKA-2841: --------------------------------------- GitHub user hachikuji opened a pull request: https://github.com/apache/kafka/pull/530 KAFKA-2841: safe group metadata cache loading/unloading You can merge this pull request into a Git repository by running: $ git pull https://github.com/hachikuji/kafka KAFKA-2841 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/530.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #530 ---- commit 881380eac954e0906ef2ec0fe3d5d8e067473a35 Author: Jason Gustafson <ja...@confluent.io> Date: 2015-11-14T23:54:25Z KAFKA-2841: safe group metadata cache loading/unloading ---- > Group metadata cache loading is not safe when reloading a partition > ------------------------------------------------------------------- > > Key: KAFKA-2841 > URL: https://issues.apache.org/jira/browse/KAFKA-2841 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.9.0.0 > Reporter: Jason Gustafson > Assignee: Jason Gustafson > Priority: Blocker > Fix For: 0.9.0.0 > > > If the coordinator receives a leaderAndIsr request which includes a higher > leader epoch for one of the partitions that it owns, then it will reload the > offset/metadata for that partition again. This can happen because the leader > epoch is incremented for ISR changes which do not result in a new leader for > the partition. Currently, the coordinator replaces cached metadata values > blindly on reloading, which can result in weird behavior such as unexpected > session timeouts or request timeouts while rebalancing. > To fix this, we need to check that the group being loaded has a higher > generation than the cached value before replacing it. Also, if we have to > replace a cached value (which shouldn't happen except when loading), we need > to be very careful to ensure that any active delayed operations won't affect > the group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)