Mark Payne created NIFI-5112:
--------------------------------
Summary: Inefficiency in replicating requests across cluster
Key: NIFI-5112
URL: https://issues.apache.org/jira/browse/NIFI-5112
Project: Apache NiFi
Issue Type: Bug
Components: Core Framework
Reporter: Mark Payne
Assignee: Mark Payne
When replicating requests across the cluster, we do some things that are rather
inefficient, which can cause the UI to feel sluggish. Because all of this is
done while the UI awaits a response, we need to ensure that this area of the
application is very responsive. Through profiling and code review, I have
identified the following places where we can improve our efficiency:
* Use of Jersey Client. Jersey Client provides a very easy-to-use API that is
very powerful. It provides a lot of capabilities to scan class paths and
automatically detect interceptors, etc. However, doing this comes at a cost.
Profiling shows that, on average, on my laptop replicating a single request
took about 100 milliseconds, 100% of which was spent actually constructing the
Jersey objects. Less than 1 millisecond of time was spent writing the message
to the socket, awaiting the reply, and parsing the response. By using a
different client, we can significantly improve this.
* Flow Serialization holds a Flow Controller Read lock for the entire
duration. This means that we block any mutable operations, such as HTTP GET
requests, while we build the appropriate DOM object for the flow, transform
that DOM object into a String, and write that String to the output stream
(including compression). We should be able to hold the Read Lock only while
building the appropriate DOM object and then perform the
transformation/serialization outside of the lock.
* Template Serialization is inefficient. Currently, for each template, we
serialize the DTO object to a String, then Deserialize that String into a DOM
object (all of this is done in order to avoid XML-based injection attacks). We
then add that DOM object into our flow's DOM object. We should instead hold
onto/cache that DOM object so that we can cut out all of the above for all but
the first iteration.
* ReflectionUtils is used when a Processor is created in order to call any
method annotated with @OnAdded. The implementation uses some Spring-based
reflection utils in order to find any sort of Bridged methods. Doing this is
expensive (on the order of 1 ms on my laptop). While this may not sound like a
concern, that means that importing a template consisting of 5,000 processors
will take 5 seconds just to find annotated methods. All within the context of a
web request. Since these methods will not change, we should instead cache a
list of Methods that contain the annotations so that we don't have to
constantly look these up.
* Authorization uses InovcationHandlers. These InvocationHandlers use
reflection to compare the method being called to a well-known method. The call
to Method.equals() is not expensive. However, the call to Class.getMethod() is
expensive and is done for every single authorization check, which can amount to
a significant amount of time being spent. Instead, we can store the method of
interest in a member variable and reference that.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)