Mark Payne created NIFI-9704:
--------------------------------

             Summary: Improve verbose diagnostics and change default value of 
"nifi.content.claim.max.appendable.size" from 1 MB to 50 KB
                 Key: NIFI-9704
                 URL: https://issues.apache.org/jira/browse/NIFI-9704
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Configuration, Core Framework
            Reporter: Mark Payne
            Assignee: Mark Payne
             Fix For: 1.16.0


We sometimes see users (especially those with a mix of flows where some produce 
very large FlowFiles and some produce tons of tiny FlowFiles) run into an issue 
where the UI shows very little space is used up by FlowFiles but the content 
repository fills up.

Yesterday I was on a call with such a team. Their NiFi UI showed one node had 
about 200,000 FlowFiles totaling dozens of MB. However, the content repository 
was 300 GB in size (which was the entire content repo). As a result, their NiFi 
instance stopped processing data because the content repo was completely full.

We did some analysis to check if there were "orphaned" flowfiles filling the 
content repository, but there were not. Instead, the {{nifi.sh diagnostics 
--verbose}} command showed us that a handful of queues were causing the content 
repo to retain those 100's of GB of data, even though the FlowFiles themselves 
only amounted to a few MB.

This is a known issue and is caused by how we write FlowFile Content to disk, 
using the same file on disk for many content claims. By default, we allow up to 
1 MB to be written to a file before we conclude that we should no longer write 
additional FlowFiles to it. This is controlled by the 
"nifi.content.claim.max.appendable.size" property.

The support team indicates that this happens frequently. We need to change the 
default value of this property from "1 MB" to "50 KB". This will dramatically 
decrease the incidence rate.

I setup a flow to test this locally. Queued up 5,000 FlowFiles totaling 610 KB, 
and the Content Repo was taking 45 GB of disk space. I then dropped all data, 
changed this property from the default 1 MB to 50 KB and repeated the test. As 
expected, I queued up the same number of files (610 KB worth), and the content 
repo occupied 2.6 GB of disk space. I.e., making the value 5% of the original 
value resulted in occupying only 5% as much "unnecessary" disk space.

Performance tests indicate that the performance was approximately the same, 
regardless of whether I used "1 MB" or "50 KB"

Additionally, when running the {{nifi.sh diagnostics --verbose}} command, the 
information that was necessary for tracking down the root cause of this was 
made available but took tremendous effort to decipher. We should update the 
diagnostics output when scanning the content repo to show the amount of data in 
the content repo that is being retained by each queue in the flow.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to