[ 
https://issues.apache.org/jira/browse/KAFKA-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18078834#comment-18078834
 ] 

Muralidhar Basani commented on KAFKA-20552:
-------------------------------------------

Thanks [~chenhaifeng] that provides more context.

Interested to follow the discussion.

> Support log segments larger than 2 GB
> -------------------------------------
>
>                 Key: KAFKA-20552
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20552
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Haifeng Chen
>            Priority: Major
>
> The {{log.segment.bytes}} broker config (and its topic-level synonym 
> {{{}segment.bytes{}}}) is currently defined as {{{}ConfigDef.Type.INT{}}}, 
> capping the maximum segment size at {{Integer.MAX_VALUE}} (2,147,483,647 
> bytes, ~2 GB). Additionally, the {{.index}} file format stores physical file 
> positions as 4-byte signed integers, which also cannot address beyond ~2 GB.
> With modern storage hardware (multi-TB NVMe drives) and high-throughput 
> workloads, the 2 GB cap is increasingly a problem:
>  * {*}Excessive file handle usage{*}: Each segment needs 4 files 
> ({{{}.log{}}}, {{{}.index{}}}, {{{}.timeindex{}}}, {{{}.txnindex{}}}). A 10 
> TB partition with 2 GB segments means ~20,000 open files.
>  * {*}Frequent segment rolls{*}: A topic ingesting 500 MB/s rolls a new 
> segment every ~4 seconds, amplifying index build, flush, and cleaner overhead.
>  * {*}More log cleaning / compaction work{*}: More segments means more 
> compaction cycles with more small groups.
>  * {*}Remote storage overhead{*}: Each segment is an individual unit for 
> tiered storage copy/delete operations.
> Allowing segments of 4 GB, 8 GB, or larger would significantly reduce these 
> overheads for high-throughput, large-retention workloads.
> KIP 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1333%3A+Support+log+segments+larger+than+2+GB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to