William Lo created GOBBLIN-1957:
-----------------------------------

             Summary: Add feature to improve ORCWriter buffer sizes with large 
record sizes
                 Key: GOBBLIN-1957
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1957
             Project: Apache Gobblin
          Issue Type: Improvement
          Components: gobblin-core
            Reporter: William Lo
            Assignee: Abhishek Tiwari


GobblinORCWriter self tune uses a number of metrics to determine how large 
their buffers should be for both its own internal buffer used for conversion 
and the native ORC writer buffer. However when there are very large record 
sizes (100s of kb) the buffers default max size (e.g. 1000) can still hold a 
very large amount of data. Observed performance would be hundreds of megabytes 
to even a gigabyte depending on the configured batch size maximums.

We want a configuration to impose a maximum buffer max size so that large 
records in the buffer do not exceed the size of a stripe, so when it is added 
to the native ORC Writer, the native orc writer should be flushing its records 
and freeing the memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to