Hi Murali, Thank you for the thorough review -- these are excellent questions that have helped sharpen the design.
mb-1 (Read path format detection) Good catch -- the KIP was unclear on this. The design uses a two-layer approach: Primary (write path): MetadataVersion controls which format is used for new index files. Before IBP_4_4_IV1 finalization, all new indexes are written in the legacy 8-byte format. After finalization, new indexes use the 12-byte format. Fallback (read path): When opening an existing index file, the format is auto-detected from the file content -- no filename suffix or marker needed. The detection analyzes the file size (divisible by 8 only? by 12 only?) and, for ambiguous sizes (divisible by both), validates the first few entries with each format to determine which interpretation produces valid, monotonically ordered data. If both interpretations are valid (rare), the MetadataVersion hint is used as the tiebreaker. This means after finalization, a log directory may contain both 8-byte and 12-byte index files, and each will be read correctly. When an old-format index is rebuilt (e.g., during recover()), it will be written in the new format. I will update the KIP to document this mechanism in detail. mb-2 (RemoteIndexCache) RemoteIndexCache will rely on the same auto-detection mechanism, since it fetches index files from tiered storage without knowing which format was active when the segment was uploaded. The auto-detection reads the file content and determines the correct format, so a 12-byte index fetched from remote storage will never be misread as legacy format. I will add this to the KIP. mb-3 (Index size guidance) Good point. I will add concrete guidance to the KIP: with 12-byte entries, log.index.size.max.bytes (default 10 MB) holds approximately 873K entries. For an 8 GB segment with the default log.index.interval.bytes of 4096 bytes, approximately 2M index entries would be needed, requiring approximately 24 MB of index space. Operators using segments larger than 2 GB should increase log.index.size.max.bytes proportionally. mb-4 (SegmentPosition rationale) SegmentPosition.relativePosition is constructed from LogOffsetMetadata.relativePositionInSegment, which will become long. Widening SegmentPosition to match avoids lossy truncation at the boundary. In practice KRaft metadata segments are small (~1 GB), so this is purely for type consistency, but it prevents subtle bugs if the types ever diverge. I will add this rationale to the KIP. mb-5 (BaseRecords.sizeInBytes() and internal callers) You are absolutely right -- the example you pointed out (int physicalPosition = log.sizeInBytes() in LogSegment) would be a real bug for segments larger than 2 GB. This and all similar internal callers will be migrated to a new sizeInBytesLong() method that returns the true long size. The existing sizeInBytes() will clamp at Integer.MAX_VALUE. The reason clamping is safe for the remaining ~449 callers is that they are in the client/network layer, where individual fetch responses are bounded by max.partition.fetch.bytes (an int config, default 1 MB). A single fetch response will never need to represent more than ~2 GB of data, so the clamped int return value is sufficient for those call sites. I will add a complete table of all migrated internal call sites to the KIP so nothing is missed during implementation review. mb-6 (Internal type changes) Yes, exactly right -- all of these position and size variables in LogSegment (and cascading callers in LocalLog, Cleaner, LogLoader, UnifiedLog, etc.) will be widened from int to long. Thank you for calling these out specifically. I will capture every such change in a table in the KIP's Proposed Changes section, so reviewers can verify completeness during implementation. Thanks again for the detailed review, Murali. I will update the KIP with all of these clarifications. Best, Haifeng
