[
https://issues.apache.org/jira/browse/NIFI-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Handermann updated NIFI-9931:
-----------------------------------
Priority: Minor (was: Major)
> OutOfMemoryError from EvaluateXPath processor halts all FlowFiles from
> upstream
> -------------------------------------------------------------------------------
>
> Key: NIFI-9931
> URL: https://issues.apache.org/jira/browse/NIFI-9931
> Project: Apache NiFi
> Issue Type: Bug
> Affects Versions: 1.16.0
> Reporter: endzeit
> Assignee: endzeit
> Priority: Minor
>
> For some of our flows in NiFi Apache we need to extract information out of
> XML files for later use. As we need to transform the FlowFile's content while
> retaining that information, we extract the required bits into FlowFile
> attributes.
> We make use of the _EvaluateXPath_ processor for this, most of the time,
> which works like a charm in 99,99% of cases.
> However, recently we had a minor outage caused by the processor. Normally the
> content inside the tag is quite small and can be put into the FlowFile
> attributes (and thus in RAM) without problems. A malprocessed XML with an
> unusually large content in one of the XML tags we extract to the FlowFile
> attributes reached the processor, which resulted in an _OutOfMemoryError_ and
> the processor itself yielding. As the FlowFile's content did not change, all
> subsequent attempts to extract the data resulted in the same
> _OutOfMemoryError_ and the processor yielding again and again.
> Ultimately, this resulted in blocking any following FlowFiles in the upstream
> and bringing processing to a halt effectively.
> ----
> That's why we'd like to propose (and contribute, if accepted) an extension to
> the _EvaluateXPath_ processor to mitigate or at least reduce the risk for
> this behaviour to occurr.
> We thought about a new (optional) property which limits the amount of
> characters / bytes allowed for each extracted tag. This "{_}Maximum Attribute
> Size{_}" would only take affect when set and the _Destination_ is set to
> {_}flowfile-attribute{_}. If any extraction would reach this limit, the
> FlowFile should be moved to the _failure_ relationship instead of yielding
> the processor and blocking the upstream.
> However, other ideas and proposals are welcomed as well. This will not be a
> complete solution to the problem, but should limit the propability of it
> happening.
> ------
> As a "quick fix", to mitigate the error for now, we prepended every
> _EvaluateXPath_ processor with a _RouteOnAttribute_ processor, that filters
> out any files whose content exceed an arbitrary size of FlowFiles we know
> were processed successfully in the past.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)