[ 
https://issues.apache.org/jira/browse/NIFI-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Witt updated NIFI-8633:
---------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> Content Repository can be improved to make fewer disks accesses on read
> -----------------------------------------------------------------------
>
>                 Key: NIFI-8633
>                 URL: https://issues.apache.org/jira/browse/NIFI-8633
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>             Fix For: 1.14.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When {{FileSystemRepository.read(ContentClaim)}} or 
> {{FileSystemRepository.read(ResourceClaim)}} is called, the repository 
> determines the file path for the claim via {{getPath(claim, true);}} where 
> the true indicates that we should verify that the file exists.
> This is done so that if we were to pass in a ContentClaim that does not 
> exist, we throw a more meaningful ContentNotFoundException instead of just 
> letting a FileNotFoundException fly.
> However, this call to {{Files.exists(Path)}} is fairly expensive, as it's a 
> disk access. For a flow that uses a lot of smaller files, this can be 
> extremely expensive.
> We can improve this by removing the call to {{Files.exists}} all together. 
> Instead, just blindly create the {{FileInputStream}} in a try/catch block and 
> catch FileNotFoundException, and then wrap that in a 
> {{ContentNotFoundException}}. This results in the same API and the same 
> contracts as before but avoids the overhead of additional disk accesses/seeks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to