Jasper Knulst created NIFI-6313:
-----------------------------------
Summary: PutGCSObject performance seems slow
Key: NIFI-6313
URL: https://issues.apache.org/jira/browse/NIFI-6313
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework, Extensions
Affects Versions: 1.9.2
Reporter: Jasper Knulst
Fix For: 1.10.0
The PutGCSObject processor to transfer files to Google Cloud Platform bucket
has bad transfer speeds.
It is impossible to put any hard figures on the throughput as it seems
dependent on:
-Network location of the Nifi node (situated in GC or not)
-Network bandwidth
-Nifi node specs
After performing benchmarks on multiple Nifi clusters (ranging from test setups
to prod. sites) the throughput can range from 8MB/s to 800MB/s.
Slow really means, slow in comparison to gsutil. If you run gsutil directly
from the Nifi node the throughput speed goes up 5 to 8 times (without
'parallel_composite_upload') and up to 16 times faster with
'parallel_composite_upload'.
The GC Java API on which Nifi's GCS processors are built, does not have the
same optimizations as gsutil and maybe isn't supported/maintained. The
Storage.create method is even deprecated.
Still there must be ways to speed up transfers the GCS by implementing parallel
composite uploads in chuncks and config options on the GCS processors
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)