Mikhail Fesenko created KAFKA-19603:
---------------------------------------

             Summary:  Change log.segment.bytes configuration type from int to 
long to support segments larger than 2GB
                 Key: KAFKA-19603
                 URL: https://issues.apache.org/jira/browse/KAFKA-19603
             Project: Kafka
          Issue Type: Improvement
          Components: core, log
            Reporter: Mikhail Fesenko


h2. Description
h3. Summary

Change the data type of *{{log.segment.bytes}}* configuration from *{{int}}* to 
*{{long}}* to allow segment sizes beyond the current 2GB limit imposed by the 
integer maximum value.
h3. Current Limitation

The {{*log.segment.bytes*}} configuration currently uses an *{{int}}* data 
type, which limits the maximum segment size to ~2GB (2,147,483,647 bytes). This 
constraint becomes problematic for modern high-capacity storage deployments.
h3. Background: Kafka Log Segment Structure

Each Kafka topic partition consists of multiple log segments stored as separate 
files on disk. For each segment, Kafka maintains three core files:
 * {*}{{.log}} files{*}: Contain the actual message data
 * {*}{{.index}} files{*}: Store mappings between message offsets and their 
physical positions within the log file, allowing Kafka to quickly locate 
messages by their offset without scanning the entire log file
 * {*}{{.timeindex}} files{*}: Store mappings between message timestamps and 
their corresponding offsets, enabling efficient time-based retrieval of messages

h3. Motivation
 # {*}Modern Hardware Capabilities{*}: Current deployments often use 
high-capacity storage (e.g., EPYC servers with 4×15TB drives) where 2GB 
segments are inefficiently small
 # {*}File Handle Optimization{*}: Large Kafka deployments with many topics can 
have 50-100k open files across all segment types (.log, .index, .timeindex 
files). Each segment requires open file handles, and larger segments would 
reduce the total number of files and improve caching efficiency
 # {*}Performance Benefits{*}: Fewer segment rotations in high-traffic 
scenarios would reduce I/O overhead and improve overall performance. Sequential 
disk operations are much faster than random access patterns
 # {*}Storage Efficiency{*}: Reducing segment file proliferation improves 
filesystem metadata performance and reduces inode usage on high-volume 
deployments
 # {*}Community Interest{*}: Similar requests have been raised in community 
forums (see [Confluent forum 
discussion|https://forum.confluent.io/t/what-happens-if-i-increase-log-segment-bytes/5845])

h3. Proposed Solution

Change *{{log.segment.bytes}}* from *{{int}}* to *{{long}}* data type, allowing 
segment sizes of 3-4GB or larger to better align with modern storage 
capabilities.
h3. Technical Considerations (Raised by Community)

Based on dev mailing list discussion:
 # {*}Index File Format Limitation{*}: Current index files use 4 bytes to 
represent file positions within segments, assuming 2GB cap (Jun Rao). This 
means:
 ** {{.index}} files store offset-to-position mappings using 4-byte integers 
for file positions
 ** If segments exceed 2GB, position values would overflow the 4-byte limit
 ** Index format may need to be updated to support 8-byte positions
 # {*}RemoteLogSegmentMetadata Interface{*}: Public interface currently uses 
{{int}} for {{segmentSizeInBytes}} and may need updates (Jun Rao)
 # {*}Segment File Ecosystem Impact{*}: Need to evaluate impact on all three 
file types (.log, .index, .timeindex) and their interdependencies
 # {*}Impact Assessment{*}: Need to evaluate all components that assume 2GB 
segment limit

h3. Questions for Discussion
 # What would be a reasonable maximum segment size limit?
 # Should this change be backward compatible or require a protocol/format 
version bump?
 # Are there any other components beyond index files and 
RemoteLogSegmentMetadata that need updates?

h3. Expected Benefits
 * Reduced number of segment files for high-volume topics
 * Improved file handle utilization and caching efficiency
 * Better alignment with modern storage hardware capabilities
 * Reduced segment rotation overhead in high-traffic scenarios

h3. Acceptance Criteria
 * {{log.segment.bytes}} accepts long values > 2GB
 * Index file format supports larger segments (if needed)
 * RemoteLogSegmentMetadata interface updated (if needed)
 * Backward compatibility maintained
 * Documentation updated
 * Unit and integration tests added


*Disclaimer*

I'm relatively new to Kafka internals and the JIRA contribution process. The 
original idea and motivation came from my experience with large-scale 
deployments, but I used Claude AI to help make this ticket more detailed and 
technically structured. There may be technical inaccuracies or missing 
implementation details that I haven't considered.
This ticket is open for community discussion and feedback before 
implementation. Expert review and guidance would be greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to