[ https://issues.apache.org/jira/browse/KAFKA-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958095#comment-15958095 ]
Shuai Lin commented on KAFKA-5010: ---------------------------------- For now I can think of a quick fix that may help: to always keep the capacity of the write buffer twice as much as the read buffer, as what i did in [this commit|https://github.com/scrapinghub/kafka/commit/66b0315681b1cbefae941ba68face7fc7f7baa78]. It's not fixing the problem from the root, but i think it can temporarily fix the write buffer overflow exception. > Log cleaner crashed with BufferOverflowException when writing to the > writeBuffer > -------------------------------------------------------------------------------- > > Key: KAFKA-5010 > URL: https://issues.apache.org/jira/browse/KAFKA-5010 > Project: Kafka > Issue Type: Bug > Components: log > Affects Versions: 0.10.2.0 > Reporter: Shuai Lin > Priority: Critical > Labels: reliability > Fix For: 0.11.0.0 > > > After upgrading from 0.10.0.1 to 0.10.2.0 the log cleaner thread crashed with > BufferOverflowException when writing the filtered records into the > writeBuffer: > {code} > [2017-03-24 10:41:03,926] INFO [kafka-log-cleaner-thread-0], Starting > (kafka.log.LogCleaner) > [2017-03-24 10:41:04,177] INFO Cleaner 0: Beginning cleaning of log > app-topic-20170317-20. (kafka.log.LogCleaner) > [2017-03-24 10:41:04,177] INFO Cleaner 0: Building offset map for > app-topic-20170317-20... (kafka.log.LogCleaner) > [2017-03-24 10:41:04,387] INFO Cleaner 0: Building offset map for log > app-topic-20170317-20 for 1 segments in offset range [9737795, 9887707). > (kafka.log.LogCleaner) > [2017-03-24 10:41:07,101] INFO Cleaner 0: Offset map for log > app-topic-20170317-20 complete. (kafka.log.LogCleaner) > [2017-03-24 10:41:07,106] INFO Cleaner 0: Cleaning log app-topic-20170317-20 > (cleaning prior to Fri Mar 24 10:36:06 GMT 2017, discarding tombstones prior > to Thu Mar 23 10:18:02 GMT 2017)... (kafka.log.LogCleaner) > [2017-03-24 10:41:07,110] INFO Cleaner 0: Cleaning segment 0 in log > app-topic-20170317-20 (largest timestamp Fri Mar 24 09:58:25 GMT 2017) into > 0, retaining deletes. (kafka.log.LogCleaner) > [2017-03-24 10:41:07,372] ERROR [kafka-log-cleaner-thread-0], Error due to > (kafka.log.LogCleaner) > java.nio.BufferOverflowException > at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:206) > at org.apache.kafka.common.record.LogEntry.writeTo(LogEntry.java:98) > at > org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:158) > at > org.apache.kafka.common.record.MemoryRecords.filterTo(MemoryRecords.java:111) > at kafka.log.Cleaner.cleanInto(LogCleaner.scala:468) > at kafka.log.Cleaner.$anonfun$cleanSegments$1(LogCleaner.scala:405) > at > kafka.log.Cleaner.$anonfun$cleanSegments$1$adapted(LogCleaner.scala:401) > at scala.collection.immutable.List.foreach(List.scala:378) > at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401) > at kafka.log.Cleaner.$anonfun$clean$6(LogCleaner.scala:363) > at kafka.log.Cleaner.$anonfun$clean$6$adapted(LogCleaner.scala:362) > at scala.collection.immutable.List.foreach(List.scala:378) > at kafka.log.Cleaner.clean(LogCleaner.scala:362) > at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241) > at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) > [2017-03-24 10:41:07,375] INFO [kafka-log-cleaner-thread-0], Stopped > (kafka.log.LogCleaner) > {code} > I tried different values of log.cleaner.buffer.size, from 512K to 2M to 10M > to 128M, all with no luck: The log cleaner thread crashed immediately after > the broker got restarted. But setting it to 256MB fixed the problem! > Here are the settings for the cluster: > {code} > - log.message.format.version = 0.9.0.0 (we use 0.9 format because have old > consumers) > - log.cleaner.enable = 'true' > - log.cleaner.min.cleanable.ratio = '0.1' > - log.cleaner.threads = '1' > - log.cleaner.io.buffer.load.factor = '0.98' > - log.roll.hours = '24' > - log.cleaner.dedupe.buffer.size = 2GB > - log.segment.bytes = 256MB (global is 512MB, but we have been using 256MB > for this topic) > - message.max.bytes = 10MB > {code} > Given that the size of readBuffer and writeBuffer are exactly the same (half > of log.cleaner.io.buffer.size), why would the cleaner throw a > BufferOverflowException when writing the filtered records into the > writeBuffer? IIUC that should never happen because the size of the filtered > records should be no greater that the size of the readBuffer, thus no greater > than the size of the writeBuffer. -- This message was sent by Atlassian JIRA (v6.3.15#6346)