Hi, Mikhail, Currently, the index file uses 4 bytes to represent the file position of a record batch within a segment file. This is based on the assumption that each segment is capped at 2GB. RemoteLogSegmentMetadata is a public interface and currently uses int to represent segmentSizeInBytes. If we want to increase the segment size, we need to consider the impact to those places too.
Thanks, Jun On Thu, Feb 13, 2025 at 1:32 PM Mikhail Fesenko <prog...@gmail.com> wrote: > Hi everyone, > > I’d like to propose a change the data type of log.segment.bytes from int to > long, allowing segment sizes beyond the current 2GB limit. > > Rationale: > > Currently, the maximum segment size is constrained by the int type, capping > it at 2GB (max int). However, with modern hardware—such as large-scale > deployments on machines with multiple high-capacity disks (e.g., EPYC > servers with 4×15TB drives)—this limitation makes segment sizes > inefficiently > small. And too many handles (log, index, metadata). So Increasing the limit > to 3–4GB (or more) would better align with today’s storage capabilities. > > Would love to hear your thoughts ! > > Best, > > Mikhail Fesenko >