[ 
https://issues.apache.org/jira/browse/SAMZA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708097#comment-17708097
 ] 

Eric Honer commented on SAMZA-2778:
-----------------------------------

[PR#1662|https://github.com/apache/samza/pull/1662] submitted for review.

> Make AzureBlobOutputStream buffer initialization size configurable.
> -------------------------------------------------------------------
>
>                 Key: SAMZA-2778
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2778
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Aditya Toomula
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The existing {{AzureBlobOutputStream}} uses a {{ByteArrayOutputStream}} to 
> buffer messages until {{flush()}} *and* new buffers are initialized to 10MB 
> (Azure's maximum block size). This can cause issues with the G1 garbage 
> collector (default in Java 11) since these would be considered humongous 
> objects. The G1 GC divides the heap into regions and considers any object 
> larger than half of a region size to be humongous. These objects are 
> immediately promoted to perm gen and allocated an entire region. Being 
> allocated to an entire region prevents the GC from allocating memory to 
> unused portions of that region. If the object is larger than a region, 
> multiple contiguous regions are allocated. If there are large number of 
> buffers the JVM can experience OOMs if no regions are empty when a new 
> {{ByteArrayOutputStream}} is created. The JVM terminates because new requires 
> immediate memory allocation and cannot not wait for GC.
> GC effectiveness can be improved if the {{ByteArrayOutputStream}} is allowed 
> to grow as messages are added and delay or even avoid being considered 
> humongous. These buffers can still become humongous objects, but only once 
> the buffer grows to sufficient size. Clients can customize the initialization 
> size to accommodate their systems.
> h3. References
>  * "[Humongous Objects and Humongous 
> Allocations|https://www.oracle.com/technical-resources/articles/java/g1gc.html#:~:text=Humongous%20Objects%20and%20Humongous%20Allocations,generation%20into%20%22Humongous%20regions%22.&text=A%20full%20garbage%20collection%20cycle%20compacts%20Humongous%20objects%20in%20place.]";
>  * "[Part 1: Introduction to the G1 Garbage 
> Collector|https://www.redhat.com/en/blog/part-1-introduction-g1-garbage-collector]";
>  * "[What's the deal with humonguous objects in 
> Java?|https://devblogs.microsoft.com/java/whats-the-deal-with-humongous-objects-in-java/]";



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to