[ 
https://issues.apache.org/jira/browse/KAFKA-19967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandini Singhal updated KAFKA-19967:
------------------------------------
    Description: 
Large heap buffer allocations in the tiered storage read path cause significant 
GC pressure by creating "humongous" objects that bypass young generation and go 
directly to old generation. Using direct buffers for these I/O-centric 
allocations would eliminate this GC overhead.

(RemoteLogManager.java:1718):
{code:java}
  int updatedFetchSize = remoteStorageFetchInfo.minOneMessage() && 
firstBatchSize > maxBytes
      ? firstBatchSize
      : maxBytes;
  ByteBuffer buffer = ByteBuffer.allocate(updatedFetchSize);{code}
  Where maxBytes = Math.min(fetchMaxBytes, fetchInfo.maxBytes), which can be 
configured up to 55MB or
  more depending on:
  - replica.fetch.max.bytes (default: 1MB)
  - replica.fetch.response.max.bytes (default: 10MB)
  - Client-side max.partition.fetch.bytes

 

In the G1GC collector (Kafka's default), objects larger than half a region size 
(~32MB with 64MB regions) are considered "humongous" and:
  1. Skip eden and young generation entirely
  2. Allocated directly in old generation
  3. Can only be reclaimed during expensive full/mixed GCs
  4. Trigger old GCs more frequently

Example: With a 4GB heap and InitiatingHeapOccupancyPercent=35, approximately 
25 concurrent tiered storage fetch requests (25 × 55MB = 1.375GB) would trigger 
an old GC. Under high read load from tiered storage, this creates continuous GC 
pressure.

Solution: Use ByteBuffer.allocateDirect() for large fetch buffers in the tiered 
storage read path. Buffers are used only for a single fetch request.

  was:
Large heap buffer allocations in the tiered storage read path cause significant 
GC pressure by creating "humongous" objects that bypass young generation and go 
directly to old generation. Using direct buffers for these I/O-centric 
allocations would eliminate this GC overhead.

(RemoteLogManager.java:1718):
{code:java}
  int updatedFetchSize = remoteStorageFetchInfo.minOneMessage() && 
firstBatchSize > maxBytes
      ? firstBatchSize
      : maxBytes;
  ByteBuffer buffer = ByteBuffer.allocate(updatedFetchSize);{code}
  Where maxBytes = Math.min(fetchMaxBytes, fetchInfo.maxBytes), which can be 
configured up to 55MB or
  more depending on:
  - replica.fetch.max.bytes (default: 1MB)
  - replica.fetch.response.max.bytes (default: 10MB)
  - Client-side max.partition.fetch.bytes

 

In the G1GC collector (Kafka's default), objects larger than half a region size 
(~32MB with 64MB regions) are considered "humongous" and:
  1. Skip eden and young generation entirely
  2. Allocated directly in old generation
  3. Can only be reclaimed during expensive full/mixed GCs
  4. Trigger old GCs more frequently

  Example: With a 4GB heap and InitiatingHeapOccupancyPercent=35, approximately 
25 concurrent tiered storage fetch requests (25 × 55MB = 1.375GB) would trigger 
an old GC. Under high read load from tiered storage, this creates continuous GC 
pressure.

  Solution: Use ByteBuffer.allocateDirect() for large fetch buffers in the 
tiered storage read path. Buffers are used only for a single fetch request.


> Reduce GC pressure in tiered storage read path by using direct memory buffers
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-19967
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19967
>             Project: Kafka
>          Issue Type: Improvement
>          Components: Tiered-Storage
>            Reporter: Nandini Singhal
>            Assignee: Nandini Singhal
>            Priority: Major
>
> Large heap buffer allocations in the tiered storage read path cause 
> significant GC pressure by creating "humongous" objects that bypass young 
> generation and go directly to old generation. Using direct buffers for these 
> I/O-centric allocations would eliminate this GC overhead.
> (RemoteLogManager.java:1718):
> {code:java}
>   int updatedFetchSize = remoteStorageFetchInfo.minOneMessage() && 
> firstBatchSize > maxBytes
>       ? firstBatchSize
>       : maxBytes;
>   ByteBuffer buffer = ByteBuffer.allocate(updatedFetchSize);{code}
>   Where maxBytes = Math.min(fetchMaxBytes, fetchInfo.maxBytes), which can be 
> configured up to 55MB or
>   more depending on:
>   - replica.fetch.max.bytes (default: 1MB)
>   - replica.fetch.response.max.bytes (default: 10MB)
>   - Client-side max.partition.fetch.bytes
>  
> In the G1GC collector (Kafka's default), objects larger than half a region 
> size (~32MB with 64MB regions) are considered "humongous" and:
>   1. Skip eden and young generation entirely
>   2. Allocated directly in old generation
>   3. Can only be reclaimed during expensive full/mixed GCs
>   4. Trigger old GCs more frequently
> Example: With a 4GB heap and InitiatingHeapOccupancyPercent=35, approximately 
> 25 concurrent tiered storage fetch requests (25 × 55MB = 1.375GB) would 
> trigger an old GC. Under high read load from tiered storage, this creates 
> continuous GC pressure.
> Solution: Use ByteBuffer.allocateDirect() for large fetch buffers in the 
> tiered storage read path. Buffers are used only for a single fetch request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to