Carsten Rietz created KAFKA-5431:
------------------------------------

             Summary: LogCleaner stopped due to 
org.apache.kafka.common.errors.CorruptRecordException
                 Key: KAFKA-5431
                 URL: https://issues.apache.org/jira/browse/KAFKA-5431
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.10.2.1
            Reporter: Carsten Rietz


Hey all,
i have a strange problem with our uat cluster of 3 kafka brokers.

the __consumer_offsets topic was replicated to two instances and our disks ran 
full due to a wrong configuration of the log cleaner. We fixed the 
configuration and updated from 0.10.1.1 to 0.10.2.1 .

Today i increased the replication of the __consumer_offsets topic to 3 and 
triggered replication to the third cluster via kafka-reassign-partitions.sh. 

That went well but i get many errors like
{code}
[2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for 
partition [__consumer_offsets,18] offset 0 error Record size is less than the 
minimum record overhead (14) (kafka.server.ReplicaFetcherThread)
[2017-06-12 09:59:50,342] ERROR Found invalid messages during fetch for 
partition [__consumer_offsets,24] offset 0 error Record size is less than the 
minimum record overhead (14) (kafka.server.ReplicaFetcherThread)
{code}
Which i think are due to the full disk event.

The log cleaner threads died on these wrong messages:
{code}
[2017-06-12 09:59:50,722] ERROR [kafka-log-cleaner-thread-0], Error due to  
(kafka.log.LogCleaner)
org.apache.kafka.common.errors.CorruptRecordException: Record size is less than 
the minimum record overhead (14)
[2017-06-12 09:59:50,722] INFO [kafka-log-cleaner-thread-0], Stopped  
(kafka.log.LogCleaner)
{code}

Looking at the file is see that some are truncated and some are jsut empty:
$ ls -lsh 00000000000000594653.log
0 -rw-r--r-- 1 user user 100M Jun 12 11:00 00000000000000594653.log

Sadly i do not have the logs any more from the disk full event itsself.

I have three questions:
* What is the best way to clean this up? Deleting the old log files and 
restarting the brokers?
* Why did kafka not handle the disk full event well? Is this only affecting the 
cleanup or may we also loose data?
* Is this maybe caused by the combination of upgrade and disk full?


And last but not least: Keep up the good work. Kafka is really performing well 
while being easy to administer and has good documentation!




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to