[ 
https://issues.apache.org/jira/browse/NIFI-12700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-12700:
--------------------------------
    Status: Patch Available  (was: Open)

> PutKudu memory optimization for unbatched flush mode (AUTO_FLUSH_SYNC)
> ----------------------------------------------------------------------
>
>                 Key: NIFI-12700
>                 URL: https://issues.apache.org/jira/browse/NIFI-12700
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Emilio Setiadarma
>            Assignee: Emilio Setiadarma
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The PutKudu processor's existing implementation uses a Map of KuduOperation 
> -> FlowFile  to keep track of which FlowFile was processing when the 
> KuduOperation was created. This is mapping is eventually used to associate 
> FlowFiles with the RowError (if any occurs), a mapping that is necessary for 
> transferring FlowFiles to success/failure relationships or logging failures 
> among other things. 
> For very large inputs, Kudu Operation objects can grow very large. There is 
> no memory leak, but still could cause OutOfMemory issues in very large input 
> data. There is a possibility to not require the use of a KuduOperation -> 
> FlowFile map for unbatched flush modes (e.g. when using the AUTO_FLUSH_SYNC 
> flush mode, where the KuduSession.apply() would have already flushed the 
> buffer before returning, 
> [https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html)|https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html]
> This Jira attempts to capture the efforts for refactoring PutKudu processor 
> to make it more memory optimized.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to