[
https://issues.apache.org/jira/browse/NIFI-12700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Burgess updated NIFI-12700:
--------------------------------
Status: Patch Available (was: Open)
> PutKudu memory optimization for unbatched flush mode (AUTO_FLUSH_SYNC)
> ----------------------------------------------------------------------
>
> Key: NIFI-12700
> URL: https://issues.apache.org/jira/browse/NIFI-12700
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Emilio Setiadarma
> Assignee: Emilio Setiadarma
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The PutKudu processor's existing implementation uses a Map of KuduOperation
> -> FlowFile to keep track of which FlowFile was processing when the
> KuduOperation was created. This is mapping is eventually used to associate
> FlowFiles with the RowError (if any occurs), a mapping that is necessary for
> transferring FlowFiles to success/failure relationships or logging failures
> among other things.
> For very large inputs, Kudu Operation objects can grow very large. There is
> no memory leak, but still could cause OutOfMemory issues in very large input
> data. There is a possibility to not require the use of a KuduOperation ->
> FlowFile map for unbatched flush modes (e.g. when using the AUTO_FLUSH_SYNC
> flush mode, where the KuduSession.apply() would have already flushed the
> buffer before returning,
> [https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html)|https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html]
> This Jira attempts to capture the efforts for refactoring PutKudu processor
> to make it more memory optimized.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)