[ https://issues.apache.org/jira/browse/KAFKA-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ewen Cheslack-Postava resolved KAFKA-3810. ------------------------------------------ Resolution: Fixed Fix Version/s: 0.10.1.0 Issue resolved by pull request 1484 [https://github.com/apache/kafka/pull/1484] > replication of internal topics should not be limited by > replica.fetch.max.bytes > ------------------------------------------------------------------------------- > > Key: KAFKA-3810 > URL: https://issues.apache.org/jira/browse/KAFKA-3810 > Project: Kafka > Issue Type: Bug > Reporter: Onur Karaman > Assignee: Onur Karaman > Fix For: 0.10.1.0 > > > From the kafka-dev mailing list discussion: > [\[DISCUSS\] scalability limits in the > coordinator|http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3ccamquqbzddtadhcgl6h4smtgo83uqt4s72gc03b3vfghnme3...@mail.gmail.com%3E] > There's a scalability limit on the new consumer / coordinator regarding the > amount of group metadata we can fit into one message. This restricts a > combination of consumer group size, topic subscription sizes, topic > assignment sizes, and any remaining member metadata. > Under more strenuous use cases like mirroring clusters with thousands of > topics, this limitation can be reached even after applying gzip to the > __consumer_offsets topic. > Various options were proposed in the discussion: > # Config change: reduce the number of consumers in the group. This isn't > always a realistic answer in more strenuous use cases like MirrorMaker > clusters or for auditing. > # Config change: split the group into smaller groups which together will get > full coverage of the topics. This gives each group member a smaller > subscription.(ex: g1 has topics starting with a-m while g2 has topics > starting with n-z). This would be operationally painful to manage. > # Config change: split the topics among members of the group. Again this > gives each group member a smaller subscription. This would also be > operationally painful to manage. > # Config change: bump up KafkaConfig.messageMaxBytes (a topic-level config) > and KafkaConfig.replicaFetchMaxBytes (a broker-level config). Applying > messageMaxBytes to just the __consumer_offsets topic seems relatively > harmless, but bumping up the broker-level replicaFetchMaxBytes would probably > need more attention. > # Config change: try different compression codecs. Based on 2 minutes of > googling, it seems like lz4 and snappy are faster than gzip but have worse > compression, so this probably won't help. > # Implementation change: support sending the regex over the wire instead of > the fully expanded topic subscriptions. I think people said in the past that > different languages have subtle differences in regex, so this doesn't play > nicely with cross-language groups. > # Implementation change: maybe we can reverse the mapping? Instead of mapping > from member to subscriptions, we can map a subscription to a list of members. > # Implementation change: maybe we can try to break apart the subscription and > assignments from the same SyncGroupRequest into multiple records? They can > still go to the same message set and get appended together. This way the > limit become the segment size, which shouldn't be a problem. This can be > tricky to get right because we're currently keying these messages on the > group, so I think records from the same rebalance might accidentally compact > one another, but my understanding of compaction isn't that great. > # Implementation change: try to apply some tricks on the assignment > serialization to make it smaller. > # Config and Implementation change: bump up the __consumer_offsets topic > messageMaxBytes and (from [~junrao]) fix how we deal with the case when a > message is larger than the fetch size. Today, if the fetch size is smaller > than the fetch size, the consumer will get stuck. Instead, we can simply > return the full message if it's larger than the fetch size w/o requiring the > consumer to manually adjust the fetch size. > # Config and Implementation change: same as above but only apply the special > fetch logic when fetching from internal topics -- This message was sent by Atlassian JIRA (v6.3.4#6332)