[
https://issues.apache.org/jira/browse/KAFKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382511#comment-15382511
]
Peter Davis commented on KAFKA-3894:
------------------------------------
Re: "the broker seems to be working"
You may regret not taking action now. As Tim mentioned from the talk at the
Kafka Summit
(http://www.slideshare.net/jjkoshy/kafkaesque-days-at-linked-in-in-2015/49), if
__consumer_offsets is not compacted and has accumulated millions (or billions!)
of messages, it can take many minutes for the broker to elect a new coordinator
after any kind of hiccup. *Your new consumers may be hung during this time!*
However, even shutting down brokers to change the configuration will cause
coordinator elections which will cause an outage. It seems like not having a
"hot spare" for Offset Managers is a liability hereā¦
We were bit by this bug and it caused all kinds of headaches until we managed
to get __consumer_offsets cleaned up again.
> Log Cleaner thread crashes and never restarts
> ---------------------------------------------
>
> Key: KAFKA-3894
> URL: https://issues.apache.org/jira/browse/KAFKA-3894
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.8.2.2, 0.9.0.1
> Environment: Oracle JDK 8
> Ubuntu Precise
> Reporter: Tim Carey-Smith
> Labels: compaction
>
> The log-cleaner thread can crash if the number of keys in a topic grows to be
> too large to fit into the dedupe buffer.
> The result of this is a log line:
> {quote}
> broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner
> \[kafka-log-cleaner-thread-0\], Error due to
> java.lang.IllegalArgumentException: requirement failed: 9750860 messages in
> segment MY_FAVORITE_TOPIC-2/00000000000047580165.log but offset map can fit
> only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease
> log.cleaner.threads
> {quote}
> As a result, the broker is left in a potentially dangerous situation where
> cleaning of compacted topics is not running.
> It is unclear if the broader strategy for the {{LogCleaner}} is the reason
> for this upper bound, or if this is a value which must be tuned for each
> specific use-case.
> Of more immediate concern is the fact that the thread crash is not visible
> via JMX or exposed as some form of service degradation.
> Some short-term remediations we have made are:
> * increasing the size of the dedupe buffer
> * monitoring the log-cleaner threads inside the JVM
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)