Emilio Setiadarma created NIFI-12700:
----------------------------------------
Summary: PutKudu memory optimization for unbatched flush mode
(AUTO_FLUSH_SYNC)
Key: NIFI-12700
URL: https://issues.apache.org/jira/browse/NIFI-12700
Project: Apache NiFi
Issue Type: Improvement
Reporter: Emilio Setiadarma
Assignee: Emilio Setiadarma
The PutKudu processor's existing implementation uses a Map of KuduOperation ->
FlowFile to keep track of which FlowFile was processing when the KuduOperation
was created. This is mapping is eventually used to associate FlowFiles with the
RowError (if any occurs), a mapping that is necessary for transferring
FlowFiles to success/failure relationships or logging failures among other
things.
For very large inputs, Kudu Operation objects can grow very large. There is no
memory leak, but still could cause OutOfMemory issues in very large input data.
There is a possibility to not require the use of a KuduOperation -> FlowFile
map for unbatched flush modes (e.g. when using the AUTO_FLUSH_SYNC flush mode,
where the KuduSession.apply() would have already flushed the buffer before
returning,
[https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html)|https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html]
This Jira attempts to capture the efforts for refactoring PutKudu processor to
make it more memory optimized.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)