[ https://issues.apache.org/jira/browse/KAFKA-19200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947272#comment-17947272 ]
Gaurav Narula commented on KAFKA-19200: --------------------------------------- Also worth noting that indexes are trimmed to size in {{AbstractIndex#close}} but an {{IOException}} there is swallowed in {{LogSegment#close}}. This may therefore cause a corrupt index on disk despite a "clean" shutdown. > Indexes should be sanity checked on startup > ------------------------------------------- > > Key: KAFKA-19200 > URL: https://issues.apache.org/jira/browse/KAFKA-19200 > Project: Kafka > Issue Type: Bug > Affects Versions: 3.9.0, 4.0.0 > Reporter: Gaurav Narula > Assignee: Gaurav Narula > Priority: Major > > KAFKA-7283 removed sanity checks for indexes on broker startup as we > thought it doesn't add much benefit. It turns out that log segment > corruption may be independent of an index corruption. > An index corruption when not caught early is quite tricky to debug. We > observed the following in production: > A corruption lead to a timeindex file on disk to be mostly filled with {{0}}s > at the end. This file is then loaded in memory such that > {{DirectByteBuffer}}'s {{position=limit=10485756}}. Note that this is 4 bytes > short of 10MiB, the configured max index size. > At this point, the log segment is eligible to be rolled as > {{TimeIndex#isFull}} will return {{true}}. We observe that the roll is > attempted from 2 code paths: > 1. ReplicaFetcherThread attempts to roll the log segment as it tries to > append records > 2. LogCleanerThread attempts to roll the log segment as it tries to clean > segments that breach retentionMs > In both scenarios, {{LogSegment#onBecomeInactiveSegment}} is invoked which in > turn invokes {{TimeIndex#maybeAppend(long timestamp, long offset, boolean > skipFullCheck)}} with {{skipFullCheck=true}}, causing an append to an already > full TimeIndex which fails by throwing a {{BufferOverException}}. > For (1), the exception causes the partition to be marked as failed, thereby > causing an under-replicated partition. For (2), the LogCleanerThread shuts > down, potentially causing a leak as other segments which are eligible for > cleanup aren't cleaned up. > We should therefore reintroduce sanity checks on startup for indexes in > {{LogSegment#sanityCheck}}, as that is invoked regardless of an unclean > shutdown and it attempts to re-create the indexes if corruption is diagnosed. -- This message was sent by Atlassian Jira (v8.20.10#820010)