Mark Payne created NIFI-8633:
--------------------------------

             Summary: Content Repository can be improved to make fewer disks 
accesses on read
                 Key: NIFI-8633
                 URL: https://issues.apache.org/jira/browse/NIFI-8633
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
            Reporter: Mark Payne
            Assignee: Mark Payne


When {{FileSystemRepository.read(ContentClaim)}} or 
{{FileSystemRepository.read(ResourceClaim)}} is called, the repository 
determines the file path for the claim via {{getPath(claim, true);}} where the 
true indicates that we should verify that the file exists.

This is done so that if we were to pass in a ContentClaim that does not exist, 
we throw a more meaningful ContentNotFoundException instead of just letting a 
FileNotFoundException fly.

However, this call to {{Files.exists(Path)}} is fairly expensive, as it's a 
disk access. For a flow that uses a lot of smaller files, this can be extremely 
expensive.

We can improve this by removing the call to {{Files.exists}} all together. 
Instead, just blindly create the {{FileInputStream}} in a try/catch block and 
catch FileNotFoundException, and then wrap that in a 
{{ContentNotFoundException}}. This results in the same API and the same 
contracts as before but avoids the overhead of additional disk accesses/seeks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to