Mark Payne created NIFI-6900:
--------------------------------
Summary: Request to retrieve flow in clustered environment should
be much less expensive
Key: NIFI-6900
URL: https://issues.apache.org/jira/browse/NIFI-6900
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework
Reporter: Mark Payne
When a request is made to the `/nifi-api/flow/process-groups/\{pgId}` endpoint,
the request must be replicated and the responses merged. The things that need
to be merged include bulletins, component permissions, statuses, load balance
indicators, validation errors, and perhaps a few others.
However, each node currently responds with a fully populated
`ProcessGroupFlowEntity`. This entity contains all information that is needed
to display the current Process Group in the UI, as well as a lot of other
details. For example, it contains the Property Descriptors for every component,
including the property description, default value, etc. These should only be
needed when configuring a component, not to display the canvas.
This request can take a while when the flow is large or when the cluster is
large, because the JSON must be parsed from every node in the cluster in order
to merge the responses. Profiling shows that the expense can be broken down
into two functions: parsing the nodes' responses into DTO objects and merging
the responses, with parsing being the dominant function in terms of cost (over
80%).
There are two big improvements that I think can be made:
* Null out some things from the DTO before returning the response. Things like
Property Descriptors, Property Values, and most all component configuration.
These should be fetched when the component is configured. However, this change
may require significant changes to the UI, as well.
* Add a query parameter to the endpoint such as `minimal=true`. This query
parameter would default to `false` in order to maintain backward compatibility
but if set to `true`, the response would contain only the information needed in
order to assemble a fully response to the client. To accomplish this, one
response would need to be fully populated (likely, this would be whichever node
is the Cluster Coordinator) and that response would include a 'fullyPopulated'
flag. This would be the 'clientResponse' that is used when merging the node
responses. All other nodes would first null out the elements that are not
required for merging. So it would include things like the bulletins, validation
errors, status, etc. Even the status could be further reduced by not including
the "human readable" values but only the raw numeric values, since the human
readable values are ignored when merging anyway.
This would significantly reduce the amount of time taken to replicate this
request, which would provide the user with a far better experience due to the
significantly shorter response times.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)