Aditya Toomula created SAMZA-2778:
-------------------------------------

             Summary: Make AzureBlobOutputStream buffer initialization size 
configurable.
                 Key: SAMZA-2778
                 URL: https://issues.apache.org/jira/browse/SAMZA-2778
             Project: Samza
          Issue Type: Bug
            Reporter: Aditya Toomula


The existing {{AzureBlobOutputStream}} uses a {{ByteArrayOutputStream}} to 
buffer messages until {{flush()}} *and* new buffers are initialized to 10MB 
(Azure's maximum block size). This can cause issues with the G1 garbage 
collector (default in Java 11) since these would be considered humongous 
objects. The G1 GC divides the heap into regions and considers any object 
larger than half of a region size to be humongous. These objects are 
immediately promoted to perm gen and allocated an entire region. Being 
allocated to an entire region prevents the GC from allocating memory to unused 
portions of that region. If the object is larger than a region, multiple 
contiguous regions are allocated. If there are large number of buffers the JVM 
can experience OOMs if no regions are empty when a new 
{{ByteArrayOutputStream}} is created. The JVM terminates because new requires 
immediate memory allocation and cannot not wait for GC.

GC effectiveness can be improved if the {{ByteArrayOutputStream}} is allowed to 
grow as messages are added and delay or even avoid being considered humongous. 
These buffers can still become humongous objects, but only once the buffer 
grows to sufficient size. Clients can customize the initialization size to 
accommodate their systems.
h3. References
 * "[Humongous Objects and Humongous 
Allocations|https://www.oracle.com/technical-resources/articles/java/g1gc.html#:~:text=Humongous%20Objects%20and%20Humongous%20Allocations,generation%20into%20%22Humongous%20regions%22.&text=A%20full%20garbage%20collection%20cycle%20compacts%20Humongous%20objects%20in%20place.]";
 * "[Part 1: Introduction to the G1 Garbage 
Collector|https://www.redhat.com/en/blog/part-1-introduction-g1-garbage-collector]";
 * "[What's the deal with humonguous objects in 
Java?|https://devblogs.microsoft.com/java/whats-the-deal-with-humongous-objects-in-java/]";



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to