junrao commented on a change in pull request #11345: URL: https://github.com/apache/kafka/pull/11345#discussion_r718789940
########## File path: core/src/main/scala/kafka/log/LogSegment.scala ########## @@ -77,8 +77,9 @@ class LogSegment private[log] (val log: FileRecords, timeIndex.resize(size) } - def sanityCheck(timeIndexFileNewlyCreated: Boolean): Unit = { - if (lazyOffsetIndex.file.exists) { + def sanityCheck(timeIndexFileNewlyCreated: Boolean, isActiveSegment: Boolean): Unit = { + // We allow for absence of offset index file only for an empty active segment. + if ((isActiveSegment && size == 0) || lazyOffsetIndex.file.exists) { Review comment: I am wondering why the active segment will be missing the offset index file during a clean shutdown. When we load the segments during broker restart, we call resizeIndexes() on the last segment. This should trigger the creation of the offset index file, which will be flushed on broker shutdown. ########## File path: core/src/main/scala/kafka/log/LogSegment.scala ########## @@ -77,8 +77,9 @@ class LogSegment private[log] (val log: FileRecords, timeIndex.resize(size) } - def sanityCheck(timeIndexFileNewlyCreated: Boolean): Unit = { - if (lazyOffsetIndex.file.exists) { + def sanityCheck(timeIndexFileNewlyCreated: Boolean, isActiveSegment: Boolean): Unit = { + // We allow for absence of offset index file only for an empty active segment. + if ((isActiveSegment && size == 0) || lazyOffsetIndex.file.exists) { Review comment: I am still trying to understand if the missing index is the result of a clean shutdown or a hard shutdown. When will roll a segment, the index on the new active segment is created lazily. However, during a clean shutdown, we force flush the active segment, which should trigger the creation of an empty index file because the following method is used in segment flush. ` def offsetIndex: OffsetIndex = lazyOffsetIndex.get ` On a hard shutdown, it's possible for the offset index to be missing. However, in that case, the offset index can be missing even when the log is not empty. So, I am wondering how common of an issue that we are fixing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org