[jira] [Commented] (NIFI-7501) Generate Flowfile does not scale

Dennis Jaheruddin (Jira) Sat, 13 Jun 2020 16:44:33 -0700


    [ 
https://issues.apache.org/jira/browse/NIFI-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134979#comment-17134979
 ]


Dennis Jaheruddin commented on NIFI-7501:
-----------------------------------------

I agree that the recommended pattern for most load testing would be 
GenerateFlowFile > DuplicateFlowFile. 

Currently someone using GenerateFlowFile would not easily find 
DuplicateFlowFile so I added the load tag and a reference in the description.

 

I can also confirm that the reason why I could not scale GenerateFlowFile 
further despite tweaking the basic settings, is that I simply hit the IO limit 
of my disk. Hence, no further enhancements appear requested. (Though it may be 
interesting to add a duplication factor directly inside GenerateFlowFile, but 
of course this would introduce redundancy with the DuplicateFlowFile processor)

> Generate Flowfile does not scale
> --------------------------------
>
>                 Key: NIFI-7501
>                 URL: https://issues.apache.org/jira/browse/NIFI-7501
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>    Affects Versions: 1.11.4
>            Reporter: Dennis Jaheruddin
>            Priority: Minor
>         Attachments: generationperformance.xml
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> One of the purposes of Generate Flowfile is load testing. However, 
> unfortunately it often appears to become the bottleneck itself. I have found 
> it not to scale well.
> Example result from my laptop:
> I want to generate messages and bring them to a single processor, lets call 
> it processor X.
> With 1 concurrent task, and a batch size of 1, and a message size of 10MB and 
> uniqueness false it can generate approximately 2 GB/sec.
> When allowing for more concurrent tasks, or a larger batch size, no 
> noticeable change is found.
> However, if instead of increasing the batchsize I route the success 
> relationship to multiple processors that do 'nothing' (like updateattribute), 
> and then bring the success relations of all these to processor X, I can get 
> much more than 2 GB/sec. 
>  
> In conclusion: I don't appear to be hitting a hardware limit as I am able to 
> generate the number of messages in this inelegant way, but no matter how I 
> set up my generateflowfile processor, it just will not scale. Suggesting 
> there may be a smarter way to generate data when uniqueness is not required.
>  
> I have attached a template to illustrate my findings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (NIFI-7501) Generate Flowfile does not scale

Reply via email to