[
https://issues.apache.org/jira/browse/NIFI-9931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
endzeit updated NIFI-9931:
--------------------------
Description:
For some of our flows in NiFi Apache we need to extract information out of XML
files for later use. As we need to transform the FlowFile's content while
retaining that information, we extract the required bits into FlowFile
attributes.
We make use of the _EvaluateXPath_ processor for this, most of the time, which
works like a charm in 99,99% of cases.
However, recently we had a minor outage caused by the processor. Normally the
content inside the tag is quite small and can be put into the FlowFile
attributes (and thus in RAM) without problems. A malprocessed XML with an
unusually large content in one of the XML tags we extract to the FlowFile
attributes reached the processor, which resulted in an _OutOfMemoryError_ and
the processor itself yielding. As the FlowFile's content did not change, all
subsequent attempts to extract the data resulted in the same _OutOfMemoryError_
and the processor yielding again and again.
Ultimately, this resulted in blocking any following FlowFiles in the upstream
and bringing processing to a halt effectively.
-----
That's why we'd like to propose (and contribute, if accepted) an extension to
the _EvaluateXPath_ processor to mitigate or at least reduce the risk for this
behaviour to occurr.
We thought about a new (optional) property which limits the amount of
characters / bytes allowed for each extracted tag. This "_Maximum Attribute
Size_" would only take affect when set and the _Destination_ is set to
_flowfile-attribute_. If any extraction would reach this limit, the FlowFile
should be moved to the _failure_ relationship instead of yielding the processor
and blocking the upstream.
However, other ideas and proposals are welcomed as well. This will not be a
complete solution to the problem, but should limit the propability of it
happening.
------
As a "quick fix", to mitigate the error for now, we prepended every
_EvaluateXPath_ processor with a _RouteOnAttribute_ processor, that filters out
any files whose content exceed an arbitrary size of FlowFiles we know could be
processed successfully in the past.
was:
For some of our flows in NiFi Apache we need to extract information out of XML
files for later use. As we need to transform the FlowFile's content while
retaining that information, we extract the required bits into FlowFile
attributes.
We make use of the _EvaluateXPath_ processor for this, most of the time, which
works like a charm in 99,99% of cases.
However, recently we had a minor outage caused by the processor. Normally the
content inside the tag is quite small and can be put into the FlowFile
attributes (and thus in RAM) without problems. A malprocessed XML with an
unusually large content in one of the XML tags we extract to the FlowFile
attributes reached the processor, which resulted in an _OutOfMemoryError_ and
the processor itself yielding. As the FlowFile's content did not change, all
subsequent attempts to extract the data resulted in the same _OutOfMemoryError_
and the processor yielding again and again.
Ultimately, this resulted in blocking any following FlowFiles in the upstream
and bringing processing to a halt effectively.
-----
That's why we'd like to propose (and contribute, if accepted) an extension to
the _EvaluateXPath_ processor to mitigate or at least reduce the risk for this
behaviour to occurr.
We thought about a new (optional) property which limits the amount of
characters / bytes allowed for each extracted tag. This "_Maximum Attribute
Size_" would only take affect when set and the _Destination_ is set to
_flowfile-attribute_.
However, other ideas and proposals are welcomed as well.
------
As a "quick fix", to mitigate the error for now, we prepended every
_EvaluateXPath_ processor with a _RouteOnAttribute_ processor, that filters out
any files whose content exceed an arbitrary size of FlowFiles we know could be
processed successfully in the past.
> OutOfMemoryError from EvaluateXPath processor halts all FlowFiles from
> upstream
> -------------------------------------------------------------------------------
>
> Key: NIFI-9931
> URL: https://issues.apache.org/jira/browse/NIFI-9931
> Project: Apache NiFi
> Issue Type: Bug
> Affects Versions: 1.16.0
> Reporter: endzeit
> Assignee: endzeit
> Priority: Major
>
> For some of our flows in NiFi Apache we need to extract information out of
> XML files for later use. As we need to transform the FlowFile's content while
> retaining that information, we extract the required bits into FlowFile
> attributes.
> We make use of the _EvaluateXPath_ processor for this, most of the time,
> which works like a charm in 99,99% of cases.
> However, recently we had a minor outage caused by the processor. Normally the
> content inside the tag is quite small and can be put into the FlowFile
> attributes (and thus in RAM) without problems. A malprocessed XML with an
> unusually large content in one of the XML tags we extract to the FlowFile
> attributes reached the processor, which resulted in an _OutOfMemoryError_ and
> the processor itself yielding. As the FlowFile's content did not change, all
> subsequent attempts to extract the data resulted in the same
> _OutOfMemoryError_ and the processor yielding again and again.
> Ultimately, this resulted in blocking any following FlowFiles in the upstream
> and bringing processing to a halt effectively.
> -----
> That's why we'd like to propose (and contribute, if accepted) an extension to
> the _EvaluateXPath_ processor to mitigate or at least reduce the risk for
> this behaviour to occurr.
> We thought about a new (optional) property which limits the amount of
> characters / bytes allowed for each extracted tag. This "_Maximum Attribute
> Size_" would only take affect when set and the _Destination_ is set to
> _flowfile-attribute_. If any extraction would reach this limit, the FlowFile
> should be moved to the _failure_ relationship instead of yielding the
> processor and blocking the upstream.
> However, other ideas and proposals are welcomed as well. This will not be a
> complete solution to the problem, but should limit the propability of it
> happening.
> ------
> As a "quick fix", to mitigate the error for now, we prepended every
> _EvaluateXPath_ processor with a _RouteOnAttribute_ processor, that filters
> out any files whose content exceed an arbitrary size of FlowFiles we know
> could be processed successfully in the past.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)