[
https://issues.apache.org/jira/browse/NIFI-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joey Frazee updated NIFI-7886:
------------------------------
Affects Version/s: (was: 1.13.2)
(was: 1.13.1)
> FetchAzureBlobStorage, FetchS3Object, and FetchGCSObject processors should be
> able to fetch ranges
> --------------------------------------------------------------------------------------------------
>
> Key: NIFI-7886
> URL: https://issues.apache.org/jira/browse/NIFI-7886
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Affects Versions: 1.12.0, 1.13.0
> Reporter: Paul Kelly
> Assignee: Paul Kelly
> Priority: Minor
> Labels: azureblob, gcs, s3
> Fix For: 1.14.0
>
> Time Spent: 5h
> Remaining Estimate: 0h
>
> Azure Blob Storage, AWS S3, and Google Cloud Storage all support retrieving
> byte ranges of stored objects. Current versions of NiFi processors for these
> services do not support fetching by byte range.
> Allowing to fetch by range would allow multiple enhancements:
> * Parallelized downloads
> ** Faster speeds if the bandwidth delay product of the connection is lower
> than the available bandwidth
> ** Load distribution over a cluster
> * Cost savings
> ** If the file is large and only part of the file is needed, the desired
> part of the file can be downloaded, saving bandwidth costs by not retrieving
> unnecessary bytes
> ** Download failures would only need to retry the failed segment, rather
> than the full file
> * Download extremely large files
> ** Ability to download files that are larger than the available content repo
> by downloading a segment and moving it off to a system with more capacity
> before downloading another segment
>
> Some of these enhancements would require an upstream processor to generate
> multiple flow files, each covering a different part of the overall range.
> Something like this:
> ListS3 -> ExecuteGroovyScript (to split into multiple flow files with
> different range attributes) -> FetchS3Object.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)