[ 
https://issues.apache.org/jira/browse/KAFKA-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840663#comment-15840663
 ] 

Onur Karaman commented on KAFKA-4704:
-------------------------------------

It might be worth mentioning that I hit a similar scenario when implementing 
the new consumer migration. Rolling back from the migration-aware old consumer 
(or just new consumer) to the old consumer with kafka-based offset storage (or 
{{dual.commit.enabled}}) causes the old consumer offset commits to fail 
silently. This is because by that point, the group has been added to the 
{{GroupCoordinator}} and generation id has been incremented to be >= 0.  The 
old consumer, on the other hand, is naively sending {{OffsetCommitRequests}} 
with an empty member id and generation id of -1, so the {{GroupCoordinator}} 
will reject the request with {{UNKNOWN_MEMBER_ID}}.

I have a workaround constraint to address the problem, but I'll leave that for 
when I send out the actual proposal.

> Group coordinator cache loading fails if groupId is used first for consumer 
> groups and then for simple consumer
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4704
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4704
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>             Fix For: 0.10.2.0
>
>
> When all the members in a consumer group have died and all of its offsets 
> have expired, we write a tombstone to __consumer_offsets so that its group 
> metadata is cleaned up. It is possible that after this happens, the same 
> groupId is then used only for offset storage (i.e. by "simple" consumers). 
> Our current cache loading logic, which is triggered when a coordinator first 
> takes over control of a partition, does not account for this scenario and 
> would currently fail.
> This is probably an unlikely scenario to hit in practice, but it reveals the 
> lack of test coverage around the cache loading logic. We should improve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to