[ 
https://issues.apache.org/jira/browse/NIFI-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joey Frazee updated NIFI-7886:
------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> FetchAzureBlobStorage, FetchS3Object, and FetchGCSObject processors should be 
> able to fetch ranges
> --------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-7886
>                 URL: https://issues.apache.org/jira/browse/NIFI-7886
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>    Affects Versions: 1.12.0, 1.13.0
>            Reporter: Paul Kelly
>            Assignee: Paul Kelly
>            Priority: Minor
>              Labels: azureblob, gcs, s3
>             Fix For: 1.14.0
>
>          Time Spent: 5h
>  Remaining Estimate: 0h
>
> Azure Blob Storage, AWS S3, and Google Cloud Storage all support retrieving 
> byte ranges of stored objects.  Current versions of NiFi processors for these 
> services do not support fetching by byte range.
> Allowing to fetch by range would allow multiple enhancements:
>  * Parallelized downloads
>  ** Faster speeds if the bandwidth delay product of the connection is lower 
> than the available bandwidth
>  ** Load distribution over a cluster
>  * Cost savings
>  ** If the file is large and only part of the file is needed, the desired 
> part of the file can be downloaded, saving bandwidth costs by not retrieving 
> unnecessary bytes
>  ** Download failures would only need to retry the failed segment, rather 
> than the full file
>  * Download extremely large files
>  ** Ability to download files that are larger than the available content repo 
> by downloading a segment and moving it off to a system with more capacity 
> before downloading another segment
>  
> Some of these enhancements would require an upstream processor to generate 
> multiple flow files, each covering a different part of the overall range.  
> Something like this:
> ListS3 -> ExecuteGroovyScript (to split into multiple flow files with 
> different range attributes) -> FetchS3Object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to