Mark Payne created NIFI-6900:
--------------------------------

             Summary: Request to retrieve flow in clustered environment should 
be much less expensive
                 Key: NIFI-6900
                 URL: https://issues.apache.org/jira/browse/NIFI-6900
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
            Reporter: Mark Payne


When a request is made to the `/nifi-api/flow/process-groups/\{pgId}` endpoint, 
the request must be replicated and the responses merged. The things that need 
to be merged include bulletins, component permissions, statuses, load balance 
indicators, validation errors, and perhaps a few others.

However, each node currently responds with a fully populated 
`ProcessGroupFlowEntity`. This entity contains all information that is needed 
to display the current Process Group in the UI, as well as a lot of other 
details. For example, it contains the Property Descriptors for every component, 
including the property description, default value, etc. These should only be 
needed when configuring a component, not to display the canvas.

This request can take a while when the flow is large or when the cluster is 
large, because the JSON must be parsed from every node in the cluster in order 
to merge the responses. Profiling shows that the expense can be broken down 
into two functions: parsing the nodes' responses into DTO objects and merging 
the responses, with parsing being the dominant function in terms of cost (over 
80%).

There are two big improvements that I think can be made:
 * Null out some things from the DTO before returning the response. Things like 
Property Descriptors, Property Values, and most all component configuration. 
These should be fetched when the component is configured. However, this change 
may require significant changes to the UI, as well.
 * Add a query parameter to the endpoint such as `minimal=true`. This query 
parameter would default to `false` in order to maintain backward compatibility 
but if set to `true`, the response would contain only the information needed in 
order to assemble a fully response to the client. To accomplish this, one 
response would need to be fully populated (likely, this would be whichever node 
is the Cluster Coordinator) and that response would include a 'fullyPopulated' 
flag. This would be the 'clientResponse' that is used when merging the node 
responses. All other nodes would first null out the elements that are not 
required for merging. So it would include things like the bulletins, validation 
errors, status, etc. Even the status could be further reduced by not including 
the "human readable" values but only the raw numeric values, since the human 
readable values are ignored when merging anyway.

This would significantly reduce the amount of time taken to replicate this 
request, which would provide the user with a far better experience due to the 
significantly shorter response times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to