Ryan Blue created PARQUET-787:

             Summary: Add a size limit for heap allocations when reading
                 Key: PARQUET-787
                 URL: https://issues.apache.org/jira/browse/PARQUET-787
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
    Affects Versions: 1.9.0
            Reporter: Ryan Blue
            Assignee: Ryan Blue

[G1GC allocates humongous objects directly in the old 
generation|https://www.infoq.com/articles/tuning-tips-G1-GC] to avoid 
unnecessary copies, which means that these allocations aren't garbage collected 
until a full GC runs. Humongous objects are objects that are 50% of the region 
size or more. Region size is at most 32MB (see the table for [region size from 

Parquet currently allocates a huge buffer for each contiguous group of column 
chunks, which in many cases is not garbage collected until a full GC. Adding a 
size limit for the allocation size should allow users to break row groups 
across multiple buffers so that buffers get collected when they have been read.

This message was sent by Atlassian JIRA

Reply via email to