Paul Kelly created NIFI-7886:
--------------------------------

             Summary: FetchAzureBlobStorage, FetchS3Object, and FetchGCSObject 
processors should be able to fetch ranges
                 Key: NIFI-7886
                 URL: https://issues.apache.org/jira/browse/NIFI-7886
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Extensions
    Affects Versions: 1.12.0
            Reporter: Paul Kelly
            Assignee: Paul Kelly


Azure Blob Storage, AWS S3, and Google Cloud Storage all support retrieving 
byte ranges of stored objects.  Current versions of NiFi processors for these 
services do not support fetching by byte range.

Allowing to fetch by range would allow multiple enhancements:
 * Parallelized downloads
 ** Faster speeds if the bandwidth delay product of the connection is lower 
than the available bandwidth
 ** Load distribution over a cluster
 * Cost savings
 ** If the file is large and only part of the file is needed, the desired part 
of the file can be downloaded, saving bandwidth costs by not retrieving 
unnecessary bytes
 ** Download failures would only need to retry the failed segment, rather than 
the full file
 * Download extremely large files
 ** Ability to download files that are larger than the available content repo 
by downloading a segment and moving it off to a system with more capacity 
before downloading another segment

 

Some of these enhancements would require an upstream processor to generate 
multiple flow files, each covering a different part of the overall range.  
Something like this:
ListS3 -> ExecuteGroovyScript (to split into multiple flow files with different 
range attributes) -> FetchS3Object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to