[
https://issues.apache.org/jira/browse/KAFKA-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18078834#comment-18078834
]
Muralidhar Basani commented on KAFKA-20552:
-------------------------------------------
Thanks [~chenhaifeng] that provides more context.
Interested to follow the discussion.
> Support log segments larger than 2 GB
> -------------------------------------
>
> Key: KAFKA-20552
> URL: https://issues.apache.org/jira/browse/KAFKA-20552
> Project: Kafka
> Issue Type: Improvement
> Components: core
> Reporter: Haifeng Chen
> Priority: Major
>
> The {{log.segment.bytes}} broker config (and its topic-level synonym
> {{{}segment.bytes{}}}) is currently defined as {{{}ConfigDef.Type.INT{}}},
> capping the maximum segment size at {{Integer.MAX_VALUE}} (2,147,483,647
> bytes, ~2 GB). Additionally, the {{.index}} file format stores physical file
> positions as 4-byte signed integers, which also cannot address beyond ~2 GB.
> With modern storage hardware (multi-TB NVMe drives) and high-throughput
> workloads, the 2 GB cap is increasingly a problem:
> * {*}Excessive file handle usage{*}: Each segment needs 4 files
> ({{{}.log{}}}, {{{}.index{}}}, {{{}.timeindex{}}}, {{{}.txnindex{}}}). A 10
> TB partition with 2 GB segments means ~20,000 open files.
> * {*}Frequent segment rolls{*}: A topic ingesting 500 MB/s rolls a new
> segment every ~4 seconds, amplifying index build, flush, and cleaner overhead.
> * {*}More log cleaning / compaction work{*}: More segments means more
> compaction cycles with more small groups.
> * {*}Remote storage overhead{*}: Each segment is an individual unit for
> tiered storage copy/delete operations.
> Allowing segments of 4 GB, 8 GB, or larger would significantly reduce these
> overheads for high-throughput, large-retention workloads.
> KIP
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1333%3A+Support+log+segments+larger+than+2+GB
--
This message was sent by Atlassian Jira
(v8.20.10#820010)