[
https://issues.apache.org/jira/browse/KAFKA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kamal Chandraprakash resolved KAFKA-8547.
-----------------------------------------
Resolution: Duplicate
Duplicate of https://issues.apache.org/jira/browse/KAFKA-8335
PR [https://github.com/apache/kafka/pull/6715]
> 2 __consumer_offsets partitions grow very big
> ---------------------------------------------
>
> Key: KAFKA-8547
> URL: https://issues.apache.org/jira/browse/KAFKA-8547
> Project: Kafka
> Issue Type: Bug
> Components: log cleaner
> Affects Versions: 2.1.1
> Environment: Ubuntu 18.04, Kafka 2.1.12-2.1.1, running as systemd
> service
> Reporter: Lerh Chuan Low
> Priority: Major
>
> It seems like log cleaner doesn't clean old data of {{__consumer_offsets}}
> on the default policy of compact on that topic. It may eventually cause disk
> to run out or for the servers to run out of memory.
> We observed a few out of memory errors with our Kafka servers and our theory
> was due to 2 overly large partitions in {{__consumer_offsets}}. On further
> digging, it looks like these 2 large partitions have segments dating up to 3
> months ago. Also, these old files collectively consumed most of the data from
> those partitions (About 10G from the partition's 12G).
> When we tried dumping those old segments, we see:
>
> {code:java}
> 1:40 $ ./kafka-run-class.sh kafka.tools.DumpLogSegments --files
> 00000000161728257775.log --offsets-decoder --print-data-log --deep-iteration
> Dumping 00000000161728257775.log
> Starting offset: 161728257775
> offset: 161728257904 position: 61 CreateTime: 1553457816168 isvalid: true
> keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 367038
> producerEpoch: 3 sequence: -1 isTransactional: true headerKeys: []
> endTxnMarker: COMMIT coordinatorEpoch: 746
> offset: 161728258098 position: 200 CreateTime: 1553457816230 isvalid: true
> keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 366036
> producerEpoch: 3 sequence: -1 isTransactional: true headerKeys: []
> endTxnMarker: COMMIT coordinatorEpoch: 761
> ...{code}
> It looks like all those old segments all contain transactional information
> (As a side note, we did take a while to figure out that for a segment with
> the control bit set, the key really is {{endTxnMarker}} and the value is
> {{coordinatorEpoch}}...otherwise in a non-control batch dump it would have
> value and payload. We were wondering if seeing what those 2 partitions
> contained in their keys may give us any clues). Our current workaround is
> based on this post:
> https://issues.apache.org/jira/browse/KAFKA-3917?focusedCommentId=16816874&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16816874.
> We set the cleanup policy to both compact,delete and very quickly the
> partition was down to below 2G. Not sure if this is something log cleaner
> should be able to handle normally? Interestingly, other partitions also
> contain transactional information so it's quite curious how 2 specific
> partitions were not able to be cleaned.
> There's a related issue here:
> https://issues.apache.org/jira/browse/KAFKA-3917, just thought it was a
> little bit outdated/dead so I opened a new one, please feel free to merge!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)