[ https://issues.apache.org/jira/browse/KAFKA-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909314#comment-16909314 ]
Vinoth Chandar commented on KAFKA-7149: --------------------------------------- To close the loop on turning on compression on the backing `_consumer_offsets` topic., seems like it can already be controlled by `offsets.topic.compression.codec`.. Please correct me if I am missing something In code : GroupMetadataManager::storeGroup() {code} def storeGroup(group: GroupMetadata, groupAssignment: Map[String, Array[Byte]], responseCallback: Errors => Unit): Unit = { getMagic(partitionFor(group.groupId)) match { case Some(magicValue) => ... val key = GroupMetadataManager.groupMetadataKey(group.groupId) val value = GroupMetadataManager.groupMetadataValue(group, groupAssignment, interBrokerProtocolVersion) val records = { val buffer = ByteBuffer.allocate(AbstractRecords.estimateSizeInBytes(magicValue, *compressionType*, Seq(new SimpleRecord(timestamp, key, value)).asJava)) val builder = MemoryRecords.builder(buffer, magicValue, compressionType, timestampType, 0L) builder.append(timestamp, key, value) builder.build() } val groupMetadataPartition = new TopicPartition(*Topic.GROUP_METADATA_TOPIC_NAME*, partitionFor(group.groupId)) val groupMetadataRecords = Map(groupMetadataPartition -> records) val generationId = group.generationId {code} > Reduce assignment data size to improve kafka streams scalability > ---------------------------------------------------------------- > > Key: KAFKA-7149 > URL: https://issues.apache.org/jira/browse/KAFKA-7149 > Project: Kafka > Issue Type: Improvement > Components: streams > Affects Versions: 2.0.0 > Reporter: Ashish Surana > Assignee: Vinoth Chandar > Priority: Major > > We observed that when we have high number of partitions, instances or > stream-threads, assignment-data size grows too fast and we start getting > below RecordTooLargeException at kafka-broker. > Workaround of this issue is commented at: > https://issues.apache.org/jira/browse/KAFKA-6976 > Still it limits the scalability of kafka streams as moving around 100MBs of > assignment data for each rebalancing affects performance & reliability > (timeout exceptions starts appearing) as well. Also this limits kafka streams > scale even with high max.message.bytes setting as data size increases pretty > quickly with number of partitions, instances or stream-threads. > > Solution: > To address this issue in our cluster, we are sending the compressed > assignment-data. We saw assignment-data size reduced by 8X-10X. This improved > the kafka streams scalability drastically for us and we could now run it with > more than 8,000 partitions. -- This message was sent by Atlassian JIRA (v7.6.14#76016)