[
https://issues.apache.org/jira/browse/SAMZA-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708097#comment-17708097
]
Eric Honer commented on SAMZA-2778:
-----------------------------------
[PR#1662|https://github.com/apache/samza/pull/1662] submitted for review.
> Make AzureBlobOutputStream buffer initialization size configurable.
> -------------------------------------------------------------------
>
> Key: SAMZA-2778
> URL: https://issues.apache.org/jira/browse/SAMZA-2778
> Project: Samza
> Issue Type: Bug
> Reporter: Aditya Toomula
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The existing {{AzureBlobOutputStream}} uses a {{ByteArrayOutputStream}} to
> buffer messages until {{flush()}} *and* new buffers are initialized to 10MB
> (Azure's maximum block size). This can cause issues with the G1 garbage
> collector (default in Java 11) since these would be considered humongous
> objects. The G1 GC divides the heap into regions and considers any object
> larger than half of a region size to be humongous. These objects are
> immediately promoted to perm gen and allocated an entire region. Being
> allocated to an entire region prevents the GC from allocating memory to
> unused portions of that region. If the object is larger than a region,
> multiple contiguous regions are allocated. If there are large number of
> buffers the JVM can experience OOMs if no regions are empty when a new
> {{ByteArrayOutputStream}} is created. The JVM terminates because new requires
> immediate memory allocation and cannot not wait for GC.
> GC effectiveness can be improved if the {{ByteArrayOutputStream}} is allowed
> to grow as messages are added and delay or even avoid being considered
> humongous. These buffers can still become humongous objects, but only once
> the buffer grows to sufficient size. Clients can customize the initialization
> size to accommodate their systems.
> h3. References
> * "[Humongous Objects and Humongous
> Allocations|https://www.oracle.com/technical-resources/articles/java/g1gc.html#:~:text=Humongous%20Objects%20and%20Humongous%20Allocations,generation%20into%20%22Humongous%20regions%22.&text=A%20full%20garbage%20collection%20cycle%20compacts%20Humongous%20objects%20in%20place.]"
> * "[Part 1: Introduction to the G1 Garbage
> Collector|https://www.redhat.com/en/blog/part-1-introduction-g1-garbage-collector]"
> * "[What's the deal with humonguous objects in
> Java?|https://devblogs.microsoft.com/java/whats-the-deal-with-humongous-objects-in-java/]"
--
This message was sent by Atlassian Jira
(v8.20.10#820010)