[ 
https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971740#comment-14971740
 ] 

Todd Palino commented on KAFKA-2235:
------------------------------------

I'm sure [~jjkoshy] will follow along with more detail on this, but we've run 
into a serious problem with this check. Basically, it's impossible to perform 
this kind of check accurately before the offset map is built. We now have 
partitions that should be able to be compacted as the total number of unique 
keys is far below the size of the offset map (currently at ~39 million for our 
configuration) but the messages are very frequent and very small. Even at a 
segment size of 64 MB, we have over 300 million messages in those segments. So 
this check creates a situation where log compaction should succeed, but fails 
because of a speculative check.

While I can play the game of trying to walk back segment sizes, there's no way 
to size segments by number of messages, so it's a guessing game. In addition, 
the check is clearly wrong in that case, so I shouldn't have to config around 
it. Lastly, the check causes the log cleaner thread to exit, which means log 
compaction on the broker fails entirely, rather than just skipping that 
partition.

A better way to handle this would be to cleanly catch the original error you 
are seeing, generate a clear error message in the logs as to what the failure 
is, and allow the log cleaner to continue and handle other partitions. You 
could also maintain a blacklist of partitions in memory in the log cleaner to 
make sure you don't come back around and try and compact the partition again.

> LogCleaner offset map overflow
> ------------------------------
>
>                 Key: KAFKA-2235
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2235
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, log
>    Affects Versions: 0.8.1, 0.8.2.0
>            Reporter: Ivan Simoneko
>            Assignee: Ivan Simoneko
>             Fix For: 0.9.0.0
>
>         Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch
>
>
> We've seen log cleaning generating an error for a topic with lots of small 
> messages. It seems that cleanup map overflow is possible if a log segment 
> contains more unique keys than empty slots in offsetMap. Check for baseOffset 
> and map utilization before processing segment seems to be not enough because 
> it doesn't take into account segment size (number of unique messages in the 
> segment).
> I suggest to estimate upper bound of keys in a segment as a number of 
> messages in the segment and compare it with the number of available slots in 
> the map (keeping in mind desired load factor). It should work in cases where 
> an empty map is capable to hold all the keys for a single segment. If even a 
> single segment no able to fit into an empty map cleanup process will still 
> fail. Probably there should be a limit on the log segment entries count?
> Here is the stack trace for this error:
> 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] 
> kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to
> java.lang.IllegalArgumentException: requirement failed: Attempt to add a new 
> entry to a full offset map.
>        at scala.Predef$.require(Predef.scala:233)
>        at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79)
>        at 
> kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543)
>        at 
> kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538)
>        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>        at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32)
>        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>        at kafka.message.MessageSet.foreach(MessageSet.scala:67)
>        at 
> kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538)
>        at 
> kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515)
>        at 
> kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512)
>        at scala.collection.immutable.Stream.foreach(Stream.scala:547)
>        at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512)
>        at kafka.log.Cleaner.clean(LogCleaner.scala:307)
>        at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221)
>        at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199)
>        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to