[ https://issues.apache.org/jira/browse/KAFKA-13251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chaos updated KAFKA-13251: -------------------------- Description: Disk error occurred in broker(=42),and then Shrinking ISR to itself. so why Shrinking ISR to an error broker? i.e. why not "Shrinking ISR from 55,42 to 55" but "Shrinking ISR from 55,42 to 42". note: other partition(110) shrink correctly. kafka logs: broker42: [2021-08-26 20:20:55,640] ERROR [ReplicaManager broker=42] Error processing fetch with max size 1048576 from consumer on partition topic_xx-123: (fetchOffset=11061228956, logStartOffset=-1, maxBytes=1048576, currentLeaderEpoch=Optional.empty) (kafka.server.ReplicaManager) org.apache.kafka.common.errors.CorruptRecordException: Found record size 0 smaller than minimum record overhead (14) in file /data4/kafka-logs/topic_xx-123/00000000011060934646.log. [2021-08-26 20:20:55,640] ERROR Error while appending records to topic_xx-123 in dir /data4/kafka-logs (kafka.server.LogDirFailureChannel) [2021-08-26 20:20:55,645] ERROR Error while deleting segments for topic_xx-123 in dir /data4/kafka-logs (kafka.server.LogDirFailureChannel) java.nio.file.FileSystemException: /data4/kafka-logs/topic_xx-123/00000000011040402299.log -> /data4/kafka-logs/topic_xx-123/00000000011040402299.log.deleted: Read-only file system Suppressed: java.nio.file.FileSystemException: /data4/kafka-logs/topic_xx-123/00000000011040402299.log -> /data4/kafka-logs/topic_xx-123/00000000011040402299.log.deleted: Read-only file system [2021-08-26 20:20:55,644] ERROR Error while appending records to topic_xx-123 in dir /data4/kafka-logs (kafka.server.LogDirFailureChannel) [2021-08-26 20:20:55,652] INFO [Partition topic_xx-123 broker=42] Shrinking ISR from 55,42 to 42. Leader: (highWatermark: 11061228956, endOffset: 11061228965). Out of sync replicas: (brokerId: 55, endOffset: 11061228956). (kafka.cluster.Partition) broker55: [2021-08-26 20:20:32,456] WARN [ReplicaFetcher replicaId=55, leaderId=42, fetcherId=0] Error in response for fetch request (type=FetchRequest, replicaId=55, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={}, isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessionId=830774713, epoch=1562806014), rackId=) (kafka.server.ReplicaFetcherThread) [2021-08-26 20:20:43,503] INFO [Partition topic_xxx-110 broker=55] Shrinking ISR from 55,42 to 55. Leader: (highWatermark: 11061384367, endOffset: 11061388788). Out of sync replicas: (brokerId: 42, endOffset: 11061384367). (kafka.cluster.Partition) disk error on broker42 is: Aug 26 20:20:55 kernel: sd 0:2:5:0: [sdf] tag#33 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK was: Disk error occurred in broker(=42),and then Shrinking ISR to itself. so why Shrinking ISR to an error broker? i.e. why not "Shrinking ISR from 55,42 to 55" but "Shrinking ISR from 55,42 to 42". kafka logs: broker42: [2021-08-26 20:20:55,640] ERROR [ReplicaManager broker=42] Error processing fetch with max size 1048576 from consumer on partition topic_xx-123: (fetchOffset=11061228956, logStartOffset=-1, maxBytes=1048576, currentLeaderEpoch=Optional.empty) (kafka.server.ReplicaManager) org.apache.kafka.common.errors.CorruptRecordException: Found record size 0 smaller than minimum record overhead (14) in file /data4/kafka-logs/topic_xx-123/00000000011060934646.log. [2021-08-26 20:20:55,640] ERROR Error while appending records to topic_xx-123 in dir /data4/kafka-logs (kafka.server.LogDirFailureChannel) [2021-08-26 20:20:55,645] ERROR Error while deleting segments for topic_xx-123 in dir /data4/kafka-logs (kafka.server.LogDirFailureChannel) java.nio.file.FileSystemException: /data4/kafka-logs/topic_xx-123/00000000011040402299.log -> /data4/kafka-logs/topic_xx-123/00000000011040402299.log.deleted: Read-only file system Suppressed: java.nio.file.FileSystemException: /data4/kafka-logs/topic_xx-123/00000000011040402299.log -> /data4/kafka-logs/topic_xx-123/00000000011040402299.log.deleted: Read-only file system [2021-08-26 20:20:55,644] ERROR Error while appending records to topic_xx-123 in dir /data4/kafka-logs (kafka.server.LogDirFailureChannel) [2021-08-26 20:20:55,652] INFO [Partition topic_xx-123 broker=42] Shrinking ISR from 55,42 to 42. Leader: (highWatermark: 11061228956, endOffset: 11061228965). Out of sync replicas: (brokerId: 55, endOffset: 11061228956). (kafka.cluster.Partition) broker55: [2021-08-26 20:20:32,456] WARN [ReplicaFetcher replicaId=55, leaderId=42, fetcherId=0] Error in response for fetch request (type=FetchRequest, replicaId=55, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={}, isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessionId=830774713, epoch=1562806014), rackId=) (kafka.server.ReplicaFetcherThread) disk error on broker42 is: Aug 26 20:20:55 kernel: sd 0:2:5:0: [sdf] tag#33 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK > 2.4.0 > ----- > > Key: KAFKA-13251 > URL: https://issues.apache.org/jira/browse/KAFKA-13251 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 2.4.0 > Environment: linux 4.1.0 > Reporter: chaos > Priority: Major > > Disk error occurred in broker(=42),and then Shrinking ISR to itself. > so why Shrinking ISR to an error broker? > i.e. why not "Shrinking ISR from 55,42 to 55" but "Shrinking ISR from 55,42 > to 42". > note: > other partition(110) shrink correctly. > > kafka logs: > broker42: > [2021-08-26 20:20:55,640] ERROR [ReplicaManager broker=42] Error processing > fetch with max size 1048576 from consumer on partition topic_xx-123: > (fetchOffset=11061228956, logStartOffset=-1, maxBytes=1048576, > currentLeaderEpoch=Optional.empty) (kafka.server.ReplicaManager) > org.apache.kafka.common.errors.CorruptRecordException: Found record size 0 > smaller than minimum record overhead (14) in file > /data4/kafka-logs/topic_xx-123/00000000011060934646.log. > [2021-08-26 20:20:55,640] ERROR Error while appending records to > topic_xx-123 in dir /data4/kafka-logs (kafka.server.LogDirFailureChannel) > [2021-08-26 20:20:55,645] ERROR Error while deleting segments for > topic_xx-123 in dir /data4/kafka-logs (kafka.server.LogDirFailureChannel) > java.nio.file.FileSystemException: > /data4/kafka-logs/topic_xx-123/00000000011040402299.log -> > /data4/kafka-logs/topic_xx-123/00000000011040402299.log.deleted: Read-only > file system > Suppressed: java.nio.file.FileSystemException: > /data4/kafka-logs/topic_xx-123/00000000011040402299.log -> > /data4/kafka-logs/topic_xx-123/00000000011040402299.log.deleted: Read-only > file system > [2021-08-26 20:20:55,644] ERROR Error while appending records to > topic_xx-123 in dir /data4/kafka-logs (kafka.server.LogDirFailureChannel) > [2021-08-26 20:20:55,652] INFO [Partition topic_xx-123 broker=42] Shrinking > ISR from 55,42 to 42. Leader: (highWatermark: 11061228956, endOffset: > 11061228965). Out of sync replicas: (brokerId: 55, endOffset: 11061228956). > (kafka.cluster.Partition) > > broker55: > [2021-08-26 20:20:32,456] WARN [ReplicaFetcher replicaId=55, leaderId=42, > fetcherId=0] Error in response for fetch request (type=FetchRequest, > replicaId=55, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={}, > isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessionId=830774713, > epoch=1562806014), rackId=) (kafka.server.ReplicaFetcherThread) > [2021-08-26 20:20:43,503] INFO [Partition topic_xxx-110 broker=55] Shrinking > ISR from 55,42 to 55. Leader: (highWatermark: 11061384367, endOffset: > 11061388788). Out of sync replicas: (brokerId: 42, endOffset: 11061384367). > (kafka.cluster.Partition) > > disk error on broker42 is: > Aug 26 20:20:55 kernel: sd 0:2:5:0: [sdf] tag#33 FAILED Result: > hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK -- This message was sent by Atlassian Jira (v8.3.4#803005)