Aditya Toomula created SAMZA-2778:
-------------------------------------
Summary: Make AzureBlobOutputStream buffer initialization size
configurable.
Key: SAMZA-2778
URL: https://issues.apache.org/jira/browse/SAMZA-2778
Project: Samza
Issue Type: Bug
Reporter: Aditya Toomula
The existing {{AzureBlobOutputStream}} uses a {{ByteArrayOutputStream}} to
buffer messages until {{flush()}} *and* new buffers are initialized to 10MB
(Azure's maximum block size). This can cause issues with the G1 garbage
collector (default in Java 11) since these would be considered humongous
objects. The G1 GC divides the heap into regions and considers any object
larger than half of a region size to be humongous. These objects are
immediately promoted to perm gen and allocated an entire region. Being
allocated to an entire region prevents the GC from allocating memory to unused
portions of that region. If the object is larger than a region, multiple
contiguous regions are allocated. If there are large number of buffers the JVM
can experience OOMs if no regions are empty when a new
{{ByteArrayOutputStream}} is created. The JVM terminates because new requires
immediate memory allocation and cannot not wait for GC.
GC effectiveness can be improved if the {{ByteArrayOutputStream}} is allowed to
grow as messages are added and delay or even avoid being considered humongous.
These buffers can still become humongous objects, but only once the buffer
grows to sufficient size. Clients can customize the initialization size to
accommodate their systems.
h3. References
* "[Humongous Objects and Humongous
Allocations|https://www.oracle.com/technical-resources/articles/java/g1gc.html#:~:text=Humongous%20Objects%20and%20Humongous%20Allocations,generation%20into%20%22Humongous%20regions%22.&text=A%20full%20garbage%20collection%20cycle%20compacts%20Humongous%20objects%20in%20place.]"
* "[Part 1: Introduction to the G1 Garbage
Collector|https://www.redhat.com/en/blog/part-1-introduction-g1-garbage-collector]"
* "[What's the deal with humonguous objects in
Java?|https://devblogs.microsoft.com/java/whats-the-deal-with-humongous-objects-in-java/]"
--
This message was sent by Atlassian Jira
(v8.20.10#820010)