[ 
https://issues.apache.org/jira/browse/KAFKA-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005715#comment-15005715
 ] 

ASF GitHub Bot commented on KAFKA-2841:
---------------------------------------

GitHub user hachikuji opened a pull request:

    https://github.com/apache/kafka/pull/530

    KAFKA-2841: safe group metadata cache loading/unloading

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hachikuji/kafka KAFKA-2841

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/530.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #530
    
----
commit 881380eac954e0906ef2ec0fe3d5d8e067473a35
Author: Jason Gustafson <ja...@confluent.io>
Date:   2015-11-14T23:54:25Z

    KAFKA-2841: safe group metadata cache loading/unloading

----


> Group metadata cache loading is not safe when reloading a partition
> -------------------------------------------------------------------
>
>                 Key: KAFKA-2841
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2841
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.9.0.0
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Blocker
>             Fix For: 0.9.0.0
>
>
> If the coordinator receives a leaderAndIsr request which includes a higher 
> leader epoch for one of the partitions that it owns, then it will reload the 
> offset/metadata for that partition again. This can happen because the leader 
> epoch is incremented for ISR changes which do not result in a new leader for 
> the partition. Currently, the coordinator replaces cached metadata values 
> blindly on reloading, which can result in weird behavior such as unexpected 
> session timeouts or request timeouts while rebalancing.
> To fix this, we need to check that the group being loaded has a higher 
> generation than the cached value before replacing it. Also, if we have to 
> replace a cached value (which shouldn't happen except when loading), we need 
> to be very careful to ensure that any active delayed operations won't affect 
> the group. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to