[ 
https://issues.apache.org/jira/browse/NIFI-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Handermann updated NIFI-9931:
-----------------------------------
    Priority: Minor  (was: Major)

> OutOfMemoryError from EvaluateXPath processor halts all FlowFiles from 
> upstream
> -------------------------------------------------------------------------------
>
>                 Key: NIFI-9931
>                 URL: https://issues.apache.org/jira/browse/NIFI-9931
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: endzeit
>            Assignee: endzeit
>            Priority: Minor
>
> For some of our flows in NiFi Apache we need to extract information out of 
> XML files for later use. As we need to transform the FlowFile's content while 
> retaining that information, we extract the required bits into FlowFile 
> attributes.
> We make use of the _EvaluateXPath_ processor for this, most of the time, 
> which works like a charm in 99,99% of cases.
> However, recently we had a minor outage caused by the processor. Normally the 
> content inside the tag is quite small and can be put into the FlowFile 
> attributes (and thus in RAM) without problems. A malprocessed XML with an 
> unusually large content in one of the XML tags we extract to the FlowFile 
> attributes reached the processor, which resulted in an _OutOfMemoryError_ and 
> the processor itself yielding. As the FlowFile's content did not change, all 
> subsequent attempts to extract the data resulted in the same 
> _OutOfMemoryError_ and the processor yielding again and again.  
> Ultimately, this resulted in blocking any following FlowFiles in the upstream 
> and bringing processing to a halt effectively.
> ----
> That's why we'd like to propose (and contribute, if accepted) an extension to 
> the _EvaluateXPath_ processor to mitigate or at least reduce the risk for 
> this behaviour to occurr.
> We thought about a new (optional) property which limits the amount of 
> characters / bytes allowed for each extracted tag. This "{_}Maximum Attribute 
> Size{_}" would only take affect when set and the _Destination_ is set to 
> {_}flowfile-attribute{_}. If any extraction would reach this limit, the 
> FlowFile should be moved to the _failure_ relationship instead of yielding 
> the processor and blocking the upstream.
> However, other ideas and proposals are welcomed as well. This will not be a 
> complete solution to the problem, but should limit the propability of it 
> happening.
> ------
> As a "quick fix", to mitigate the error for now, we prepended every 
> _EvaluateXPath_ processor with a _RouteOnAttribute_ processor, that filters 
> out any files whose content exceed an arbitrary size of FlowFiles we know 
> were processed successfully in the past.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to