Mark Payne created NIFI-5480:
--------------------------------
Summary: Improve efficiency of how components are looked up by
Identifier
Key: NIFI-5480
URL: https://issues.apache.org/jira/browse/NIFI-5480
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework
Reporter: Mark Payne
Assignee: Mark Payne
When we lookup a component by ID, we do so by obtaining the Root Process Group
and then calling {{findLocalConnectable(String id)}}. This method obtains a
read lock, then checks its map of Processors, its map of Input Ports, its map
of Output Ports, and its map of Funnels. If no match is found, it then calls
getRemoteProcessGroups() to iterate over each of those, looking for a Remote
Input/Output Port with that ID. This call to {{getRemoteProcessGroups()}}
creates a new {{HashSet}} that is then returned. If no match is found, we then
call {{getProcessGroups()}} which also creates a new {{HashSet}} of
ProcessGroup objects, and we iterate over those (recursively).
This means that for each call to lookup a component by ID, we have to create
two {{HashSet}}s - for each Process Group on the canvas, until the component is
found. Consider a flow that has a dozen Process Groups and several thousand
Processors/ports/funnels. If we then click "Start" on the root group, we must
create up to 24 {{HashSet}} objects and obtain 12 Read Locks. This is done for
each component, so for 1,000 Processors it will create 24,000 {{HashSet}}s and
obtain 12,000 Read Locks. Also, since this is a mutable request, this has to be
done for both the first and second phase of the request, which results in a
total of 48,000 {{HashSet}}s and 24,000 Read Locks being obtained.
Testing with 10,000 Processors I am seeing requests take well over 30 seconds
to complete. All just to find a component by identifier. We can make this much
more efficient.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)