[ https://issues.apache.org/jira/browse/KAFKA-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17897635#comment-17897635 ]
Teddy Yan commented on KAFKA-9613: ---------------------------------- Yes, it's a disk issue. I met this problem too. I did a test on this hardware issue. I found an interesting problem with Kafka. It seems that the replica itself is a kind of consumer. If there is a problem with the leader's disk and it is stuck, other replicas will not get the data neither. The entire system stops working. The leader will not send data directly to the replica, but will also place it on disk first. If there is a problem with the placement, everything will not work. If the disk is damaged, we can only wait for retention. All data in the meantime will be lost. Stopping the world might help operator to notice the problem, but we still loss data before manually recover. My question is whether we can have some configuration to skip the log ASAP back to work not waiting for retention timeout. > CorruptRecordException: Found record size 0 smaller than minimum record > overhead > -------------------------------------------------------------------------------- > > Key: KAFKA-9613 > URL: https://issues.apache.org/jira/browse/KAFKA-9613 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 2.6.2 > Reporter: Amit Khandelwal > Assignee: hudeqi > Priority: Major > > 20200224;21:01:38: [2020-02-24 21:01:38,615] ERROR [ReplicaManager broker=0] > Error processing fetch with max size 1048576 from consumer on partition > SANDBOX.BROKER.NEWORDER-0: (fetchOffset=211886, logStartOffset=-1, > maxBytes=1048576, currentLeaderEpoch=Optional.empty) > (kafka.server.ReplicaManager) > 20200224;21:01:38: org.apache.kafka.common.errors.CorruptRecordException: > Found record size 0 smaller than minimum record overhead (14) in file > /data/tmp/kafka-topic-logs/SANDBOX.BROKER.NEWORDER-0/00000000000000000000.log. > 20200224;21:05:48: [2020-02-24 21:05:48,711] INFO [GroupMetadataManager > brokerId=0] Removed 0 expired offsets in 1 milliseconds. > (kafka.coordinator.group.GroupMetadataManager) > 20200224;21:10:22: [2020-02-24 21:10:22,204] INFO [GroupCoordinator 0]: > Member > xxxxxxxx_011-9e61d2c9-ce5a-4231-bda1-f04e6c260dc0-StreamThread-1-consumer-27768816-ee87-498f-8896-191912282d4f > in group yyyyyyyyy_011 has failed, removing it from the group > (kafka.coordinator.group.GroupCoordinator) > > [https://stackoverflow.com/questions/60404510/kafka-broker-issue-replica-manager-with-max-size#] > > -- This message was sent by Atlassian Jira (v8.20.10#820010)