Jasper Knulst created NIFI-6313:
-----------------------------------

             Summary: PutGCSObject performance seems slow
                 Key: NIFI-6313
                 URL: https://issues.apache.org/jira/browse/NIFI-6313
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework, Extensions
    Affects Versions: 1.9.2
            Reporter: Jasper Knulst
             Fix For: 1.10.0


The PutGCSObject processor to transfer files to Google Cloud Platform bucket 
has bad transfer speeds.

It is impossible to put any hard figures on the throughput as it seems 
dependent on:

-Network location of the Nifi node (situated in GC or not)

-Network bandwidth

-Nifi node specs

 

After performing benchmarks on multiple Nifi clusters (ranging from test setups 
to prod. sites) the throughput can range from 8MB/s to 800MB/s. 

Slow really means, slow in comparison to gsutil. If you run gsutil directly 
from the Nifi node the throughput speed goes up 5 to 8 times (without 
'parallel_composite_upload') and up to 16 times faster with 
'parallel_composite_upload'.

 

The GC Java API on which Nifi's GCS processors are built, does not have the 
same optimizations as gsutil and maybe isn't supported/maintained. The 
Storage.create method is even deprecated.

Still there must be ways to speed up transfers the GCS by implementing parallel 
composite uploads in chuncks and config options on the GCS processors 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to