endzeit created NIFI-9931:
-----------------------------

             Summary: OutOfMemoryError from EvaluateXPath processor halts all 
FlowFiles from upstream
                 Key: NIFI-9931
                 URL: https://issues.apache.org/jira/browse/NIFI-9931
             Project: Apache NiFi
          Issue Type: Bug
    Affects Versions: 1.16.0
            Reporter: endzeit
            Assignee: endzeit


For some of our flows in NiFi Apache we need to extract information out of XML 
files for later use. As we need to transform the FlowFile's content while 
retaining that information, we extract the required bits into FlowFile 
attributes.

We make use of the _EvaluateXPath_ processor for this, most of the time, which 
works like a charm in 99,99% of cases.

However, recently we had a minor outage caused by the processor. Normally the 
content inside the tag is quite small and can be put into the FlowFile 
attributes (and thus in RAM) without problems. A malprocessed XML with an 
unusually large content in one of the XML tags we extract to the FlowFile 
attributes reached the processor, which resulted in an _OutOfMemoryError_ and 
the processor itself yielding. As the FlowFile's content did not change, all 
subsequent attempts to extract the data resulted in the same _OutOfMemoryError_ 
and the processor yielding again and again.  

Ultimately, this resulted in blocking any following FlowFiles in the upstream 
and bringing processing to a halt effectively.

-----

That's why we'd like to propose (and contribute, if accepted) an extension to 
the _EvaluateXPath_ processor to mitigate or at least reduce the risk for this 
behaviour to occurr.

We thought about a new (optional) property which limits the amount of 
characters / bytes allowed for each extracted tag. This "_Maximum Attribute 
Size_" would only take affect when set and the _Destination_ is set to 
_flowfile-attribute_.

However, other ideas and proposals are welcomed as well.

------

As a "quick fix", to mitigate the error for now, we prepended every 
_EvaluateXPath_ processor with a _RouteOnAttribute_ processor, that filters out 
any files whose content exceed an arbitrary size of FlowFiles we know could be 
processed successfully in the past.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to