[ 
https://issues.apache.org/jira/browse/KAFKA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash resolved KAFKA-8547.
-----------------------------------------
    Resolution: Duplicate

Duplicate of https://issues.apache.org/jira/browse/KAFKA-8335

PR [https://github.com/apache/kafka/pull/6715]

> 2 __consumer_offsets partitions grow very big
> ---------------------------------------------
>
>                 Key: KAFKA-8547
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8547
>             Project: Kafka
>          Issue Type: Bug
>          Components: log cleaner
>    Affects Versions: 2.1.1
>         Environment: Ubuntu 18.04, Kafka 2.1.12-2.1.1, running as systemd 
> service
>            Reporter: Lerh Chuan Low
>            Priority: Major
>
> It seems like log cleaner doesn't clean old data of  {{__consumer_offsets}} 
> on the default policy of compact on that topic. It may eventually cause disk 
> to run out or for the servers to run out of memory.
> We observed a few out of memory errors with our Kafka servers and our theory 
> was due to 2 overly large partitions in {{__consumer_offsets}}. On further 
> digging, it looks like these 2 large partitions have segments dating up to 3 
> months ago. Also, these old files collectively consumed most of the data from 
> those partitions (About 10G from the partition's 12G). 
> When we tried dumping those old segments, we see:
>  
> {code:java}
> 1:40 $ ./kafka-run-class.sh kafka.tools.DumpLogSegments --files 
> 00000000161728257775.log --offsets-decoder --print-data-log --deep-iteration
>  Dumping 00000000161728257775.log
>  Starting offset: 161728257775
>  offset: 161728257904 position: 61 CreateTime: 1553457816168 isvalid: true 
> keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 367038 
> producerEpoch: 3 sequence: -1 isTransactional: true headerKeys: [] 
> endTxnMarker: COMMIT coordinatorEpoch: 746
>  offset: 161728258098 position: 200 CreateTime: 1553457816230 isvalid: true 
> keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 366036 
> producerEpoch: 3 sequence: -1 isTransactional: true headerKeys: [] 
> endTxnMarker: COMMIT coordinatorEpoch: 761
>  ...{code}
> It looks like all those old segments all contain transactional information 
> (As a side note, we did take a while to figure out that for a segment with 
> the control bit set, the key really is {{endTxnMarker}} and the value is 
> {{coordinatorEpoch}}...otherwise in a non-control batch dump it would have 
> value and payload. We were wondering if seeing what those 2 partitions 
> contained in their keys may give us any clues). Our current workaround is 
> based on this post: 
> https://issues.apache.org/jira/browse/KAFKA-3917?focusedCommentId=16816874&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16816874.
>  We set the cleanup policy to both compact,delete and very quickly the 
> partition was down to below 2G. Not sure if this is something log cleaner 
> should be able to handle normally? Interestingly, other partitions also 
> contain transactional information so it's quite curious how 2 specific 
> partitions were not able to be cleaned. 
> There's a related issue here: 
> https://issues.apache.org/jira/browse/KAFKA-3917, just thought it was a 
> little bit outdated/dead so I opened a new one, please feel free to merge!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to