[ 
https://issues.apache.org/jira/browse/NIFI-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769207#comment-15769207
 ] 

ASF GitHub Bot commented on NIFI-3236:
--------------------------------------

Github user ijokarumawak commented on the issue:

    https://github.com/apache/nifi/pull/1347
  
    I tested the change with following NiFi flow:
    
    - GenerateFlowFile
    - FetchFile: Fetch a JSON file containing 10,000 objects in an array
    - SplitJson: Emit 10,000 flow files, Json path: "$[*]"
    - UpdateAttribute: 'Run Duration' to 2 secs to consume flow files fast 
enough
    
    
![image](https://cloud.githubusercontent.com/assets/1107620/21416832/77dfd668-c859-11e6-979b-49606d79eea3.png)
    
    
    The input JSON file was generated by this command, and wrapped with "[]":
    
    ```
    for i in `seq 1 10000`; do echo "{\"name\": \"item-$i\", \"value\": $i},"; 
done > /tmp/input.json
    ```
    
    Then, kept it running for more than 5 minutes to measure its outgoing 
throughput at SplitJson.
    
    | Before patch | After patch |
    |--------------|-------------|
    | 3,940,000 flow files (123.17MB) / 5min <br/> 
![image](https://cloud.githubusercontent.com/assets/1107620/21416835/883e3c0c-c859-11e6-8ccc-155269f29ca0.png)
 <br/> 
![image](https://cloud.githubusercontent.com/assets/1107620/21416903/f30b739c-c859-11e6-8277-abb75dee8567.png)
 | 4,570,000 flow files (142.86 MB) / 5min <br/> 
![image](https://cloud.githubusercontent.com/assets/1107620/21416839/8eacd5c6-c859-11e6-9823-5e356205dd4f.png)
 <br/> 
![image](https://cloud.githubusercontent.com/assets/1107620/21416893/e5f89b62-c859-11e6-9e37-3338e190bb2c.png)|
    
    Measured above few times, and confirmed that it's consistently faster with 
this patch.
    
    Thanks @brosander , I'll merge this into master!


> SplitJson performance improvements
> ----------------------------------
>
>                 Key: NIFI-3236
>                 URL: https://issues.apache.org/jira/browse/NIFI-3236
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Bryan Rosander
>            Assignee: Bryan Rosander
>            Priority: Minor
>         Attachments: after.png, before.png
>
>
> SplitJson does a lot of work in every onTrigger() that it doesn't need to.  
> This includes putting each attribute separately as well as looping over the 
> output segments twice.
> It also fetches a property in onTrigger() that could be gotten in an 
> @OnScheduled method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to