Paul Kelly created NIFI-7886:
--------------------------------
Summary: FetchAzureBlobStorage, FetchS3Object, and FetchGCSObject
processors should be able to fetch ranges
Key: NIFI-7886
URL: https://issues.apache.org/jira/browse/NIFI-7886
Project: Apache NiFi
Issue Type: Improvement
Components: Extensions
Affects Versions: 1.12.0
Reporter: Paul Kelly
Assignee: Paul Kelly
Azure Blob Storage, AWS S3, and Google Cloud Storage all support retrieving
byte ranges of stored objects. Current versions of NiFi processors for these
services do not support fetching by byte range.
Allowing to fetch by range would allow multiple enhancements:
* Parallelized downloads
** Faster speeds if the bandwidth delay product of the connection is lower
than the available bandwidth
** Load distribution over a cluster
* Cost savings
** If the file is large and only part of the file is needed, the desired part
of the file can be downloaded, saving bandwidth costs by not retrieving
unnecessary bytes
** Download failures would only need to retry the failed segment, rather than
the full file
* Download extremely large files
** Ability to download files that are larger than the available content repo
by downloading a segment and moving it off to a system with more capacity
before downloading another segment
Some of these enhancements would require an upstream processor to generate
multiple flow files, each covering a different part of the overall range.
Something like this:
ListS3 -> ExecuteGroovyScript (to split into multiple flow files with different
range attributes) -> FetchS3Object.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)