[jira] [Resolved] (NIFI-6900) Request to retrieve flow in clustered environment should be much less expensive

Mark Payne (Jira) Mon, 20 Jun 2022 11:07:05 -0700


     [ 
https://issues.apache.org/jira/browse/NIFI-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mark Payne resolved NIFI-6900.
------------------------------
    Fix Version/s: 1.15.0
       Resolution: Fixed

> Request to retrieve flow in clustered environment should be much less 
> expensive
> -------------------------------------------------------------------------------
>
>                 Key: NIFI-6900
>                 URL: https://issues.apache.org/jira/browse/NIFI-6900
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Mark Payne
>            Priority: Major
>              Labels: cluster, dto, performance, ui
>             Fix For: 1.15.0
>
>
> When a request is made to the `/nifi-api/flow/process-groups/\{pgId}` 
> endpoint, the request must be replicated and the responses merged. The things 
> that need to be merged include bulletins, component permissions, statuses, 
> load balance indicators, validation errors, and perhaps a few others.
> However, each node currently responds with a fully populated 
> `ProcessGroupFlowEntity`. This entity contains all information that is needed 
> to display the current Process Group in the UI, as well as a lot of other 
> details. For example, it contains the Property Descriptors for every 
> component, including the property description, default value, etc. These 
> should only be needed when configuring a component, not to display the canvas.
> This request can take a while when the flow is large or when the cluster is 
> large, because the JSON must be parsed from every node in the cluster in 
> order to merge the responses. Profiling shows that the expense can be broken 
> down into two functions: parsing the nodes' responses into DTO objects and 
> merging the responses, with parsing being the dominant function in terms of 
> cost (over 80%).
> There are two big improvements that I think can be made:
>  * Null out some things from the DTO before returning the response. Things 
> like Property Descriptors, Property Values, and most all component 
> configuration. These should be fetched when the component is configured. 
> However, this change may require significant changes to the UI, as well.
>  * Add a query parameter to the endpoint such as `minimal=true`. This query 
> parameter would default to `false` in order to maintain backward 
> compatibility but if set to `true`, the response would contain only the 
> information needed in order to assemble a fully response to the client. To 
> accomplish this, one response would need to be fully populated (likely, this 
> would be whichever node is the Cluster Coordinator) and that response would 
> include a 'fullyPopulated' flag. This would be the 'clientResponse' that is 
> used when merging the node responses. All other nodes would first null out 
> the elements that are not required for merging. So it would include things 
> like the bulletins, validation errors, status, etc. Even the status could be 
> further reduced by not including the "human readable" values but only the raw 
> numeric values, since the human readable values are ignored when merging 
> anyway.
> This would significantly reduce the amount of time taken to replicate this 
> request, which would provide the user with a far better experience due to the 
> significantly shorter response times.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (NIFI-6900) Request to retrieve flow in clustered environment should be much less expensive

Reply via email to