[ 
https://issues.apache.org/jira/browse/NIFI-11557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725565#comment-17725565
 ] 

ASF subversion and git services commented on NIFI-11557:
--------------------------------------------------------

Commit a84a7cb60aa3bbae900ade5d1f2413b71dabdf38 in nifi's branch 
refs/heads/support/nifi-1.x from Matt Burgess
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=a84a7cb60a ]

NIFI-11557: Fixed error with Java 11 code


> Eliminate use of Files.walkFileTree for any performance-critical parts of 
> application
> -------------------------------------------------------------------------------------
>
>                 Key: NIFI-11557
>                 URL: https://issues.apache.org/jira/browse/NIFI-11557
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework, Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>              Labels: content-repo, content-repository, performance, slowness, 
> startup
>             Fix For: 2.0.0, 1.22.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The FileSystemRepository (content repo implementation) as well as ListFile 
> both make use of the {{Files.walkFileTree}} method. Recently, I worked with a 
> user who had horribly long startup times. Thread dumps show that the time was 
> almost entirely in the FileSystemRepository's {{initializeRepository}} method 
> as it is walking the file tree in order to determine which archive files can 
> be cleaned up next. This is done during startup and again periodically in 
> background threads.
> I made a small modification locally to instead use the standard synchronous 
> IO methods ( {{File.listFiles}} method. I used GenerateFlowFile to generate 
> 1-byte FlowFiles and set  {{nifi.content.claim.max.appendable.size=1 B}} in 
> nifi.properties in order to generate a huge number of files - about 1.2 
> million files in the content repository and restarted a few times. 
> Additionally, added some log lines to show how long this part of the startup 
> process took.
> With the existing code, startup took 210 seconds (3.5 mins). With the new 
> implementation, it took 6.7 seconds. The appears to be due to the fact that 
> when using NIO.2 for every file, it does an individual disk access to obtain 
> File attributes, while when using the {{File.listFiles}} method the File 
> objects that are returned already have the necessary attributes. As a result, 
> the NIO.2 approach makes millions of disk accesses that are unnecessary. As 
> the number of files in the repository grows, the discrepancy also grows.
> We need to eliminate any use of {{File.walkFileTree}} for any 
> performance-critical parts of the codebase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to