dajac commented on code in PR #18020: URL: https://github.com/apache/kafka/pull/18020#discussion_r1872790894
########## group-coordinator/src/main/java/org/apache/kafka/coordinator/group/modern/consumer/ConsumerGroup.java: ########## @@ -798,6 +827,60 @@ private void validateMemberEpoch( } } + /** + * Computes the subscription type based on the provided information. + * + * @param subscribedRegularExpressions The subscribed regular expression count. + * @param subscribedTopicNames The subscribed topic name count. + * @param numberOfMembers The number of members in the group. + * + * @return The subscription type. + */ + public static SubscriptionType subscriptionType( + Map<String, Integer> subscribedRegularExpressions, + Map<String, SubscriptionCount> subscribedTopicNames, + int numberOfMembers + ) { + if (subscribedRegularExpressions.isEmpty()) { + // If the members do not use regular expressions, the subscription is + // considered as homogeneous if all the members are subscribed to the + // same topics. Otherwise, it is considered as heterogeneous. + for (SubscriptionCount subscriberCount : subscribedTopicNames.values()) { + if (subscriberCount.byNameCount != numberOfMembers) { + return HETEROGENEOUS; + } + } + return HOMOGENEOUS; + } else { + int count = subscribedRegularExpressions.values().iterator().next(); + if (count == numberOfMembers) { + // If all the members are subscribed to a single regular expressions + // and none of them are subscribed to topic names, the subscription + // is considered as homogeneous. If some members are subscribed to + // topic names too, the subscription is considered as heterogeneous. + for (SubscriptionCount subscriberCount : subscribedTopicNames.values()) { + if (subscriberCount.byRegexCount != 1 || subscriberCount.byNameCount > 0) { + return HETEROGENEOUS; Review Comment: The definition is not that well defined. I think that we have the choice between two definitions: 1) All the members use the same subscription; or 2) All the members are subscribed to the same topics. In this patch, I suggests to use 1) while I agree that 2) would be the best. The challenge with 2) is that it is not easy to compute it. Imagine the following: * 6 members, 2 topics `foo` and `fooo` * 3 members subscribed via name `foo` * 1 member subscribed via regex `foo.*` * 1 member subscribed via regex `fo.*` * 1 member subscribed via regex `.*` and via name `foo` It should be homogeneous too because they are all subscribed to `foo` and `fooo`. However, it is hard to compute it based on the information that we have in memory. Our data model makes our life hard here. I am open to suggestions though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org