Mark Payne created NIFI-9704:
--------------------------------
Summary: Improve verbose diagnostics and change default value of
"nifi.content.claim.max.appendable.size" from 1 MB to 50 KB
Key: NIFI-9704
URL: https://issues.apache.org/jira/browse/NIFI-9704
Project: Apache NiFi
Issue Type: Improvement
Components: Configuration, Core Framework
Reporter: Mark Payne
Assignee: Mark Payne
Fix For: 1.16.0
We sometimes see users (especially those with a mix of flows where some produce
very large FlowFiles and some produce tons of tiny FlowFiles) run into an issue
where the UI shows very little space is used up by FlowFiles but the content
repository fills up.
Yesterday I was on a call with such a team. Their NiFi UI showed one node had
about 200,000 FlowFiles totaling dozens of MB. However, the content repository
was 300 GB in size (which was the entire content repo). As a result, their NiFi
instance stopped processing data because the content repo was completely full.
We did some analysis to check if there were "orphaned" flowfiles filling the
content repository, but there were not. Instead, the {{nifi.sh diagnostics
--verbose}} command showed us that a handful of queues were causing the content
repo to retain those 100's of GB of data, even though the FlowFiles themselves
only amounted to a few MB.
This is a known issue and is caused by how we write FlowFile Content to disk,
using the same file on disk for many content claims. By default, we allow up to
1 MB to be written to a file before we conclude that we should no longer write
additional FlowFiles to it. This is controlled by the
"nifi.content.claim.max.appendable.size" property.
The support team indicates that this happens frequently. We need to change the
default value of this property from "1 MB" to "50 KB". This will dramatically
decrease the incidence rate.
I setup a flow to test this locally. Queued up 5,000 FlowFiles totaling 610 KB,
and the Content Repo was taking 45 GB of disk space. I then dropped all data,
changed this property from the default 1 MB to 50 KB and repeated the test. As
expected, I queued up the same number of files (610 KB worth), and the content
repo occupied 2.6 GB of disk space. I.e., making the value 5% of the original
value resulted in occupying only 5% as much "unnecessary" disk space.
Performance tests indicate that the performance was approximately the same,
regardless of whether I used "1 MB" or "50 KB"
Additionally, when running the {{nifi.sh diagnostics --verbose}} command, the
information that was necessary for tracking down the root cause of this was
made available but took tremendous effort to decipher. We should update the
diagnostics output when scanning the content repo to show the amount of data in
the content repo that is being retained by each queue in the flow.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)