[
https://issues.apache.org/jira/browse/NIFI-11557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Payne updated NIFI-11557:
------------------------------
Summary: Eliminate use of Files.walkFileTree for any performance-critical
parts of application (was: Eliminate use of NIO.2 for any performance-critical
parts of application)
> Eliminate use of Files.walkFileTree for any performance-critical parts of
> application
> -------------------------------------------------------------------------------------
>
> Key: NIFI-11557
> URL: https://issues.apache.org/jira/browse/NIFI-11557
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework, Extensions
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
> Fix For: 1.latest, 2.latest
>
>
> The FileSystemRepository (content repo implementation) as well as ListFile
> both make use of the {{Files.walkFileTree}} method. Recently, I worked with a
> user who had horribly long startup times. Thread dumps show that the time was
> almost entirely in the FileSystemRepository's {{initializeRepository}} method
> as it is walking the file tree in order to determine which archive files can
> be cleaned up next. This is done during startup and again periodically in
> background threads.
> I made a small modification locally to instead use the standard synchronous
> IO methods ( {{File.listFiles}} method. I used GenerateFlowFile to generate
> 1-byte FlowFiles and set {{nifi.content.claim.max.appendable.size=1 B}} in
> nifi.properties in order to generate a huge number of files - about 1.2
> million files in the content repository and restarted a few times.
> Additionally, added some log lines to show how long this part of the startup
> process took.
> With the existing code, startup took 210 seconds (3.5 mins). With the new
> implementation, it took 6.7 seconds. The appears to be due to the fact that
> when using NIO.2 for every file, it does an individual disk access to obtain
> File attributes, while when using the {{File.listFiles}} method the File
> objects that are returned already have the necessary attributes. As a result,
> the NIO.2 approach makes millions of disk accesses that are unnecessary. As
> the number of files in the repository grows, the discrepancy also grows.
> We need to eliminate any use of {{File.walkFileTree}} for any
> performance-critical parts of the codebase.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)