[
https://issues.apache.org/jira/browse/NIFI-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769207#comment-15769207
]
ASF GitHub Bot commented on NIFI-3236:
--------------------------------------
Github user ijokarumawak commented on the issue:
https://github.com/apache/nifi/pull/1347
I tested the change with following NiFi flow:
- GenerateFlowFile
- FetchFile: Fetch a JSON file containing 10,000 objects in an array
- SplitJson: Emit 10,000 flow files, Json path: "$[*]"
- UpdateAttribute: 'Run Duration' to 2 secs to consume flow files fast
enough

The input JSON file was generated by this command, and wrapped with "[]":
```
for i in `seq 1 10000`; do echo "{\"name\": \"item-$i\", \"value\": $i},";
done > /tmp/input.json
```
Then, kept it running for more than 5 minutes to measure its outgoing
throughput at SplitJson.
| Before patch | After patch |
|--------------|-------------|
| 3,940,000 flow files (123.17MB) / 5min <br/>

<br/>

| 4,570,000 flow files (142.86 MB) / 5min <br/>

<br/>
|
Measured above few times, and confirmed that it's consistently faster with
this patch.
Thanks @brosander , I'll merge this into master!
> SplitJson performance improvements
> ----------------------------------
>
> Key: NIFI-3236
> URL: https://issues.apache.org/jira/browse/NIFI-3236
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Bryan Rosander
> Assignee: Bryan Rosander
> Priority: Minor
> Attachments: after.png, before.png
>
>
> SplitJson does a lot of work in every onTrigger() that it doesn't need to.
> This includes putting each attribute separately as well as looping over the
> output segments twice.
> It also fetches a property in onTrigger() that could be gotten in an
> @OnScheduled method.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)