William Lo created GOBBLIN-1957:
-----------------------------------
Summary: Add feature to improve ORCWriter buffer sizes with large
record sizes
Key: GOBBLIN-1957
URL: https://issues.apache.org/jira/browse/GOBBLIN-1957
Project: Apache Gobblin
Issue Type: Improvement
Components: gobblin-core
Reporter: William Lo
Assignee: Abhishek Tiwari
GobblinORCWriter self tune uses a number of metrics to determine how large
their buffers should be for both its own internal buffer used for conversion
and the native ORC writer buffer. However when there are very large record
sizes (100s of kb) the buffers default max size (e.g. 1000) can still hold a
very large amount of data. Observed performance would be hundreds of megabytes
to even a gigabyte depending on the configured batch size maximums.
We want a configuration to impose a maximum buffer max size so that large
records in the buffer do not exceed the size of a stripe, so when it is added
to the native ORC Writer, the native orc writer should be flushing its records
and freeing the memory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)