This is interesting, could be a huge barrier for this change, thanks for
your clarification

On Fri, 21 Feb 2025 at 18:52, Jun Rao <j...@confluent.io.invalid> wrote:

> Hi, Mikhail,
>
> Currently, the index file uses 4 bytes to represent the file position of a
> record batch within a segment file. This is based on the assumption that
> each segment is capped at 2GB. RemoteLogSegmentMetadata is a public
> interface and currently uses int to represent segmentSizeInBytes. If we
> want to increase the segment size, we need to consider the impact to those
> places too.
>
> Thanks,
>
> Jun
>
> On Thu, Feb 13, 2025 at 1:32 PM Mikhail Fesenko <prog...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I’d like to propose a change the data type of log.segment.bytes from int
> to
> > long, allowing segment sizes beyond the current 2GB limit.
> >
> > Rationale:
> >
> > Currently, the maximum segment size is constrained by the int type,
> capping
> > it at 2GB (max int). However, with modern hardware—such as large-scale
> > deployments on machines with multiple high-capacity disks (e.g., EPYC
> > servers with 4×15TB drives)—this limitation makes segment sizes
> > inefficiently
> > small. And too many handles (log, index, metadata). So Increasing the
> limit
> > to 3–4GB (or more) would better align with today’s storage capabilities.
> >
> > Would love to hear your thoughts !
> >
> > Best,
> >
> > Mikhail Fesenko
> >
>

Reply via email to