[
https://issues.apache.org/jira/browse/NIFI-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Payne updated NIFI-8633:
-----------------------------
Fix Version/s: 1.14.0
Status: Patch Available (was: Open)
> Content Repository can be improved to make fewer disks accesses on read
> -----------------------------------------------------------------------
>
> Key: NIFI-8633
> URL: https://issues.apache.org/jira/browse/NIFI-8633
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
> Fix For: 1.14.0
>
>
> When {{FileSystemRepository.read(ContentClaim)}} or
> {{FileSystemRepository.read(ResourceClaim)}} is called, the repository
> determines the file path for the claim via {{getPath(claim, true);}} where
> the true indicates that we should verify that the file exists.
> This is done so that if we were to pass in a ContentClaim that does not
> exist, we throw a more meaningful ContentNotFoundException instead of just
> letting a FileNotFoundException fly.
> However, this call to {{Files.exists(Path)}} is fairly expensive, as it's a
> disk access. For a flow that uses a lot of smaller files, this can be
> extremely expensive.
> We can improve this by removing the call to {{Files.exists}} all together.
> Instead, just blindly create the {{FileInputStream}} in a try/catch block and
> catch FileNotFoundException, and then wrap that in a
> {{ContentNotFoundException}}. This results in the same API and the same
> contracts as before but avoids the overhead of additional disk accesses/seeks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)