Mikhail Fesenko created KAFKA-19603: ---------------------------------------
Summary: Change log.segment.bytes configuration type from int to long to support segments larger than 2GB Key: KAFKA-19603 URL: https://issues.apache.org/jira/browse/KAFKA-19603 Project: Kafka Issue Type: Improvement Components: core, log Reporter: Mikhail Fesenko h2. Description h3. Summary Change the data type of *{{log.segment.bytes}}* configuration from *{{int}}* to *{{long}}* to allow segment sizes beyond the current 2GB limit imposed by the integer maximum value. h3. Current Limitation The {{*log.segment.bytes*}} configuration currently uses an *{{int}}* data type, which limits the maximum segment size to ~2GB (2,147,483,647 bytes). This constraint becomes problematic for modern high-capacity storage deployments. h3. Background: Kafka Log Segment Structure Each Kafka topic partition consists of multiple log segments stored as separate files on disk. For each segment, Kafka maintains three core files: * {*}{{.log}} files{*}: Contain the actual message data * {*}{{.index}} files{*}: Store mappings between message offsets and their physical positions within the log file, allowing Kafka to quickly locate messages by their offset without scanning the entire log file * {*}{{.timeindex}} files{*}: Store mappings between message timestamps and their corresponding offsets, enabling efficient time-based retrieval of messages h3. Motivation # {*}Modern Hardware Capabilities{*}: Current deployments often use high-capacity storage (e.g., EPYC servers with 4×15TB drives) where 2GB segments are inefficiently small # {*}File Handle Optimization{*}: Large Kafka deployments with many topics can have 50-100k open files across all segment types (.log, .index, .timeindex files). Each segment requires open file handles, and larger segments would reduce the total number of files and improve caching efficiency # {*}Performance Benefits{*}: Fewer segment rotations in high-traffic scenarios would reduce I/O overhead and improve overall performance. Sequential disk operations are much faster than random access patterns # {*}Storage Efficiency{*}: Reducing segment file proliferation improves filesystem metadata performance and reduces inode usage on high-volume deployments # {*}Community Interest{*}: Similar requests have been raised in community forums (see [Confluent forum discussion|https://forum.confluent.io/t/what-happens-if-i-increase-log-segment-bytes/5845]) h3. Proposed Solution Change *{{log.segment.bytes}}* from *{{int}}* to *{{long}}* data type, allowing segment sizes of 3-4GB or larger to better align with modern storage capabilities. h3. Technical Considerations (Raised by Community) Based on dev mailing list discussion: # {*}Index File Format Limitation{*}: Current index files use 4 bytes to represent file positions within segments, assuming 2GB cap (Jun Rao). This means: ** {{.index}} files store offset-to-position mappings using 4-byte integers for file positions ** If segments exceed 2GB, position values would overflow the 4-byte limit ** Index format may need to be updated to support 8-byte positions # {*}RemoteLogSegmentMetadata Interface{*}: Public interface currently uses {{int}} for {{segmentSizeInBytes}} and may need updates (Jun Rao) # {*}Segment File Ecosystem Impact{*}: Need to evaluate impact on all three file types (.log, .index, .timeindex) and their interdependencies # {*}Impact Assessment{*}: Need to evaluate all components that assume 2GB segment limit h3. Questions for Discussion # What would be a reasonable maximum segment size limit? # Should this change be backward compatible or require a protocol/format version bump? # Are there any other components beyond index files and RemoteLogSegmentMetadata that need updates? h3. Expected Benefits * Reduced number of segment files for high-volume topics * Improved file handle utilization and caching efficiency * Better alignment with modern storage hardware capabilities * Reduced segment rotation overhead in high-traffic scenarios h3. Acceptance Criteria * {{log.segment.bytes}} accepts long values > 2GB * Index file format supports larger segments (if needed) * RemoteLogSegmentMetadata interface updated (if needed) * Backward compatibility maintained * Documentation updated * Unit and integration tests added *Disclaimer* I'm relatively new to Kafka internals and the JIRA contribution process. The original idea and motivation came from my experience with large-scale deployments, but I used Claude AI to help make this ticket more detailed and technically structured. There may be technical inaccuracies or missing implementation details that I haven't considered. This ticket is open for community discussion and feedback before implementation. Expert review and guidance would be greatly appreciated. -- This message was sent by Atlassian Jira (v8.20.10#820010)