[ 
https://issues.apache.org/jira/browse/NIFI-11557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725535#comment-17725535
 ] 

ASF subversion and git services commented on NIFI-11557:
--------------------------------------------------------

Commit a12c9ca9c72e8004afaf2f91088141ffd67ac437 in nifi's branch 
refs/heads/main from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=a12c9ca9c7 ]

NIFI-11557: Avoid using the expensive and unnecessary Files.walkFileTree on 
startup and initialization of Content Repository. Also performed some code 
cleanup: IntelliJ flagged many warnings in the class, mostly around methods 
that are no longer used and potential NullPointerExceptions, so those were 
cleaned up. Additionally, removed the nifi property for max flowfiles per claim 
- this property was never implemented. It was referenced, but the way in which 
is was used curiously had nothing to do with what the property was intended to 
be used for or for how it was documented. Instead, it was used to limit the max 
number of claims that could remain writable. As a result, it was removed.

NIFI-11557: Added an additional system test and updated github actions to 
include surefire-report in order to help diagnose problem that occurred in one 
of the last system-test runs in Github. Could not replicate problem locally
Signed-off-by: Matthew Burgess <[email protected]>

This closes #7265


> Eliminate use of Files.walkFileTree for any performance-critical parts of 
> application
> -------------------------------------------------------------------------------------
>
>                 Key: NIFI-11557
>                 URL: https://issues.apache.org/jira/browse/NIFI-11557
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework, Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>              Labels: content-repo, content-repository, performance, slowness, 
> startup
>             Fix For: 1.latest, 2.latest
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The FileSystemRepository (content repo implementation) as well as ListFile 
> both make use of the {{Files.walkFileTree}} method. Recently, I worked with a 
> user who had horribly long startup times. Thread dumps show that the time was 
> almost entirely in the FileSystemRepository's {{initializeRepository}} method 
> as it is walking the file tree in order to determine which archive files can 
> be cleaned up next. This is done during startup and again periodically in 
> background threads.
> I made a small modification locally to instead use the standard synchronous 
> IO methods ( {{File.listFiles}} method. I used GenerateFlowFile to generate 
> 1-byte FlowFiles and set  {{nifi.content.claim.max.appendable.size=1 B}} in 
> nifi.properties in order to generate a huge number of files - about 1.2 
> million files in the content repository and restarted a few times. 
> Additionally, added some log lines to show how long this part of the startup 
> process took.
> With the existing code, startup took 210 seconds (3.5 mins). With the new 
> implementation, it took 6.7 seconds. The appears to be due to the fact that 
> when using NIO.2 for every file, it does an individual disk access to obtain 
> File attributes, while when using the {{File.listFiles}} method the File 
> objects that are returned already have the necessary attributes. As a result, 
> the NIO.2 approach makes millions of disk accesses that are unnecessary. As 
> the number of files in the repository grows, the discrepancy also grows.
> We need to eliminate any use of {{File.walkFileTree}} for any 
> performance-critical parts of the codebase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to