[
https://issues.apache.org/jira/browse/NIFI-11557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725565#comment-17725565
]
ASF subversion and git services commented on NIFI-11557:
--------------------------------------------------------
Commit a84a7cb60aa3bbae900ade5d1f2413b71dabdf38 in nifi's branch
refs/heads/support/nifi-1.x from Matt Burgess
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=a84a7cb60a ]
NIFI-11557: Fixed error with Java 11 code
> Eliminate use of Files.walkFileTree for any performance-critical parts of
> application
> -------------------------------------------------------------------------------------
>
> Key: NIFI-11557
> URL: https://issues.apache.org/jira/browse/NIFI-11557
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework, Extensions
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
> Labels: content-repo, content-repository, performance, slowness,
> startup
> Fix For: 2.0.0, 1.22.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> The FileSystemRepository (content repo implementation) as well as ListFile
> both make use of the {{Files.walkFileTree}} method. Recently, I worked with a
> user who had horribly long startup times. Thread dumps show that the time was
> almost entirely in the FileSystemRepository's {{initializeRepository}} method
> as it is walking the file tree in order to determine which archive files can
> be cleaned up next. This is done during startup and again periodically in
> background threads.
> I made a small modification locally to instead use the standard synchronous
> IO methods ( {{File.listFiles}} method. I used GenerateFlowFile to generate
> 1-byte FlowFiles and set {{nifi.content.claim.max.appendable.size=1 B}} in
> nifi.properties in order to generate a huge number of files - about 1.2
> million files in the content repository and restarted a few times.
> Additionally, added some log lines to show how long this part of the startup
> process took.
> With the existing code, startup took 210 seconds (3.5 mins). With the new
> implementation, it took 6.7 seconds. The appears to be due to the fact that
> when using NIO.2 for every file, it does an individual disk access to obtain
> File attributes, while when using the {{File.listFiles}} method the File
> objects that are returned already have the necessary attributes. As a result,
> the NIO.2 approach makes millions of disk accesses that are unnecessary. As
> the number of files in the repository grows, the discrepancy also grows.
> We need to eliminate any use of {{File.walkFileTree}} for any
> performance-critical parts of the codebase.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)