[
https://issues.apache.org/jira/browse/KAFKA-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840663#comment-15840663
]
Onur Karaman commented on KAFKA-4704:
-------------------------------------
It might be worth mentioning that I hit a similar scenario when implementing
the new consumer migration. Rolling back from the migration-aware old consumer
(or just new consumer) to the old consumer with kafka-based offset storage (or
{{dual.commit.enabled}}) causes the old consumer offset commits to fail
silently. This is because by that point, the group has been added to the
{{GroupCoordinator}} and generation id has been incremented to be >= 0. The
old consumer, on the other hand, is naively sending {{OffsetCommitRequests}}
with an empty member id and generation id of -1, so the {{GroupCoordinator}}
will reject the request with {{UNKNOWN_MEMBER_ID}}.
I have a workaround constraint to address the problem, but I'll leave that for
when I send out the actual proposal.
> Group coordinator cache loading fails if groupId is used first for consumer
> groups and then for simple consumer
> ---------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-4704
> URL: https://issues.apache.org/jira/browse/KAFKA-4704
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1
> Reporter: Jason Gustafson
> Assignee: Jason Gustafson
> Fix For: 0.10.2.0
>
>
> When all the members in a consumer group have died and all of its offsets
> have expired, we write a tombstone to __consumer_offsets so that its group
> metadata is cleaned up. It is possible that after this happens, the same
> groupId is then used only for offset storage (i.e. by "simple" consumers).
> Our current cache loading logic, which is triggered when a coordinator first
> takes over control of a partition, does not account for this scenario and
> would currently fail.
> This is probably an unlikely scenario to hit in practice, but it reveals the
> lack of test coverage around the cache loading logic. We should improve this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)