[ https://issues.apache.org/jira/browse/KAFKA-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840663#comment-15840663 ]
Onur Karaman commented on KAFKA-4704: ------------------------------------- It might be worth mentioning that I hit a similar scenario when implementing the new consumer migration. Rolling back from the migration-aware old consumer (or just new consumer) to the old consumer with kafka-based offset storage (or {{dual.commit.enabled}}) causes the old consumer offset commits to fail silently. This is because by that point, the group has been added to the {{GroupCoordinator}} and generation id has been incremented to be >= 0. The old consumer, on the other hand, is naively sending {{OffsetCommitRequests}} with an empty member id and generation id of -1, so the {{GroupCoordinator}} will reject the request with {{UNKNOWN_MEMBER_ID}}. I have a workaround constraint to address the problem, but I'll leave that for when I send out the actual proposal. > Group coordinator cache loading fails if groupId is used first for consumer > groups and then for simple consumer > --------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-4704 > URL: https://issues.apache.org/jira/browse/KAFKA-4704 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1 > Reporter: Jason Gustafson > Assignee: Jason Gustafson > Fix For: 0.10.2.0 > > > When all the members in a consumer group have died and all of its offsets > have expired, we write a tombstone to __consumer_offsets so that its group > metadata is cleaned up. It is possible that after this happens, the same > groupId is then used only for offset storage (i.e. by "simple" consumers). > Our current cache loading logic, which is triggered when a coordinator first > takes over control of a partition, does not account for this scenario and > would currently fail. > This is probably an unlikely scenario to hit in practice, but it reveals the > lack of test coverage around the cache loading logic. We should improve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)