Mark Payne created NIFI-8633:
--------------------------------
Summary: Content Repository can be improved to make fewer disks
accesses on read
Key: NIFI-8633
URL: https://issues.apache.org/jira/browse/NIFI-8633
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework
Reporter: Mark Payne
Assignee: Mark Payne
When {{FileSystemRepository.read(ContentClaim)}} or
{{FileSystemRepository.read(ResourceClaim)}} is called, the repository
determines the file path for the claim via {{getPath(claim, true);}} where the
true indicates that we should verify that the file exists.
This is done so that if we were to pass in a ContentClaim that does not exist,
we throw a more meaningful ContentNotFoundException instead of just letting a
FileNotFoundException fly.
However, this call to {{Files.exists(Path)}} is fairly expensive, as it's a
disk access. For a flow that uses a lot of smaller files, this can be extremely
expensive.
We can improve this by removing the call to {{Files.exists}} all together.
Instead, just blindly create the {{FileInputStream}} in a try/catch block and
catch FileNotFoundException, and then wrap that in a
{{ContentNotFoundException}}. This results in the same API and the same
contracts as before but avoids the overhead of additional disk accesses/seeks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)