[
https://issues.apache.org/jira/browse/KAFKA-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691169#comment-15691169
]
Jason Gustafson commented on KAFKA-4435:
----------------------------------------
cc [~onurkaraman]
> Improve storage overhead of group metadata
> ------------------------------------------
>
> Key: KAFKA-4435
> URL: https://issues.apache.org/jira/browse/KAFKA-4435
> Project: Kafka
> Issue Type: Improvement
> Components: consumer
> Reporter: Jason Gustafson
>
> The GroupMetadataManager serializes the full subscriptions and assignments of
> all consumer group members for each generation as a single message. This is a
> problem for large consumer groups with a large number of topics since each
> member's subscription/assignment is serialized separately. So if you have n
> consumers each subscribing to the same m topics, then the serialized message
> will contain m*n subscribed topics. At a certain size, you end up exceeding
> the max message size.
> Some ideas for getting around this have been 1) turning on compression and 2)
> adding regex support to the protocol. Both of these help, but maybe we should
> question whether the subscriptions/assignments need to be written at all. The
> reason to include this information in the log is basically it prevent a
> rebalance on coordinator failover. After failover, the new coordinator can
> consume the log and determine the full state of every group. The consumers in
> the group simply send heartbeats to the new coordinator, once it is found.
> In fact, preventing the rebalance is not really the main issue: it's ensuring
> that the last generation can commit its offsets. If nothing were written to
> the log, then the group would be recreated after failover from scratch and
> existing members would not be able to commit offsets (since their generation
> would no longer be valid). But the subscription/assignment is opaque to the
> coordinator and is not actually used when committing offsets. All it really
> need is the generation and the list of memberIds.
> Supposing then that we removed the subscriptions/assignments from the group,
> but retained the generation/memberId information, one loose end is servicing
> the DescribeGroup request. After failover, we would no longer have the
> subscription/assignment information we need to answer that request. One
> option would be to trigger a rebalance after failover in order to repopulate
> it. The previous generation would still be able to commit offsets before
> rejoining the group. Potentially we could even delay this rebalance until we
> actually receive a DescribeGroups request.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)