[ 
https://issues.apache.org/jira/browse/KAFKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406970#comment-15406970
 ] 

Jun Rao commented on KAFKA-3894:
--------------------------------

[~tcrayford-heroku], I chatted with [~jkreps] on this a bit. There are a couple 
things that we can do to address this issue.

a. We can potentially make the allocation of the dedup buffer more dynamic. We 
can start with something small like 100MB. If needed, we can grow the dedup 
buffer up to the configured size. This will allow us to set a larger default 
dedup buffer size (say 1GB). If there are not lots of keys, the broker won't be 
using that much memory. This will allow the default configuration to 
accommodate more keys.

b. To handle the edge case where a segment still has more keys than the 
increased dedup buffer can handle. We can do the #3 approach as you suggested. 
Basically, if the dedup buffer is full when only a partial segment is loaded, 
we remember the next offset (say L). We scan all old log segments including 
this one as before. The only difference is that when scanning the last segment, 
we force creating a new segment starting at offset L and simply copy the 
existing messages after L to the new segment. Then, after we swapped in the new 
segments, we will move the cleaner marker to offset L. This adds a bit of 
inefficiency since we have to scan the last swapped-in segment again. However, 
this will allow the cleaner to always make progress regardless of the # of 
keys. I am not sure that I understand the case you mentioned that won't work in 
both approach #3 and #4.

> Log Cleaner thread crashes and never restarts
> ---------------------------------------------
>
>                 Key: KAFKA-3894
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3894
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.2.2, 0.9.0.1
>         Environment: Oracle JDK 8
> Ubuntu Precise
>            Reporter: Tim Carey-Smith
>              Labels: compaction
>
> The log-cleaner thread can crash if the number of keys in a topic grows to be 
> too large to fit into the dedupe buffer. 
> The result of this is a log line: 
> {quote}
> broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner 
> \[kafka-log-cleaner-thread-0\], Error due to  
> java.lang.IllegalArgumentException: requirement failed: 9750860 messages in 
> segment MY_FAVORITE_TOPIC-2/00000000000047580165.log but offset map can fit 
> only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease 
> log.cleaner.threads
> {quote}
> As a result, the broker is left in a potentially dangerous situation where 
> cleaning of compacted topics is not running. 
> It is unclear if the broader strategy for the {{LogCleaner}} is the reason 
> for this upper bound, or if this is a value which must be tuned for each 
> specific use-case. 
> Of more immediate concern is the fact that the thread crash is not visible 
> via JMX or exposed as some form of service degradation. 
> Some short-term remediations we have made are:
> * increasing the size of the dedupe buffer
> * monitoring the log-cleaner threads inside the JVM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to