[ 
https://issues.apache.org/jira/browse/KAFKA-19200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17947272#comment-17947272
 ] 

Gaurav Narula commented on KAFKA-19200:
---------------------------------------

Also worth noting that indexes are trimmed to size in {{AbstractIndex#close}} 
but an {{IOException}} there is swallowed in {{LogSegment#close}}. This may 
therefore cause a corrupt index on disk despite a "clean" shutdown.

> Indexes should be sanity checked on startup
> -------------------------------------------
>
>                 Key: KAFKA-19200
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19200
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.9.0, 4.0.0
>            Reporter: Gaurav Narula
>            Assignee: Gaurav Narula
>            Priority: Major
>
> KAFKA-7283 removed sanity checks for indexes on broker startup as we
> thought it doesn't add much benefit. It turns out that log segment
> corruption may be independent of an index corruption.
> An index corruption when not caught early is quite tricky to debug. We 
> observed the following in production:
> A corruption lead to a timeindex file on disk to be mostly filled with {{0}}s 
> at the end. This file is then loaded in memory such that 
> {{DirectByteBuffer}}'s {{position=limit=10485756}}. Note that this is 4 bytes 
> short of 10MiB, the configured max index size.
> At this point, the log segment is eligible to be rolled as 
> {{TimeIndex#isFull}} will return {{true}}. We observe that the roll is 
> attempted from 2 code paths:
> 1. ReplicaFetcherThread attempts to roll the log segment as it tries to 
> append records
> 2. LogCleanerThread attempts to roll the log segment as it tries to clean 
> segments that breach retentionMs
> In both scenarios, {{LogSegment#onBecomeInactiveSegment}} is invoked which in 
> turn invokes {{TimeIndex#maybeAppend(long timestamp, long offset, boolean 
> skipFullCheck)}} with {{skipFullCheck=true}}, causing an append to an already 
> full TimeIndex which fails by throwing a {{BufferOverException}}.
> For (1), the exception causes the partition to be marked as failed, thereby 
> causing an under-replicated partition. For (2), the LogCleanerThread shuts 
> down, potentially causing a leak as other segments which are eligible for 
> cleanup aren't cleaned up.
> We should therefore reintroduce sanity checks on startup for indexes in 
> {{LogSegment#sanityCheck}}, as that is invoked regardless of an unclean 
> shutdown and it attempts to re-create the indexes if corruption is diagnosed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to