This is interesting, could be a huge barrier for this change, thanks for your clarification
On Fri, 21 Feb 2025 at 18:52, Jun Rao <j...@confluent.io.invalid> wrote: > Hi, Mikhail, > > Currently, the index file uses 4 bytes to represent the file position of a > record batch within a segment file. This is based on the assumption that > each segment is capped at 2GB. RemoteLogSegmentMetadata is a public > interface and currently uses int to represent segmentSizeInBytes. If we > want to increase the segment size, we need to consider the impact to those > places too. > > Thanks, > > Jun > > On Thu, Feb 13, 2025 at 1:32 PM Mikhail Fesenko <prog...@gmail.com> wrote: > > > Hi everyone, > > > > I’d like to propose a change the data type of log.segment.bytes from int > to > > long, allowing segment sizes beyond the current 2GB limit. > > > > Rationale: > > > > Currently, the maximum segment size is constrained by the int type, > capping > > it at 2GB (max int). However, with modern hardware—such as large-scale > > deployments on machines with multiple high-capacity disks (e.g., EPYC > > servers with 4×15TB drives)—this limitation makes segment sizes > > inefficiently > > small. And too many handles (log, index, metadata). So Increasing the > limit > > to 3–4GB (or more) would better align with today’s storage capabilities. > > > > Would love to hear your thoughts ! > > > > Best, > > > > Mikhail Fesenko > > >