[jira] [Created] (NIFI-5112) Inefficiency in replicating requests across cluster

Mark Payne (JIRA) Mon, 23 Apr 2018 11:34:41 -0700

Mark Payne created NIFI-5112:
--------------------------------

             Summary: Inefficiency in replicating requests across cluster
                 Key: NIFI-5112
                 URL: https://issues.apache.org/jira/browse/NIFI-5112
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
            Reporter: Mark Payne
            Assignee: Mark Payne



When replicating requests across the cluster, we do some things that are rather 
inefficient, which can cause the UI to feel sluggish. Because all of this is 
done while the UI awaits a response, we need to ensure that this area of the 
application is very responsive. Through profiling and code review, I have 
identified the following places where we can improve our efficiency:
 * Use of Jersey Client. Jersey Client provides a very easy-to-use API that is 
very powerful. It provides a lot of capabilities to scan class paths and 
automatically detect interceptors, etc. However, doing this comes at a cost. 
Profiling shows that, on average, on my laptop replicating a single request 
took about 100 milliseconds, 100% of which was spent actually constructing the 
Jersey objects. Less than 1 millisecond of time was spent writing the message 
to the socket, awaiting the reply, and parsing the response. By using a 
different client, we can significantly improve this.
 * Flow Serialization holds a Flow Controller Read lock for the entire 
duration. This means that we block any mutable operations, such as HTTP GET 
requests, while we build the appropriate DOM object for the flow, transform 
that DOM object into a String, and write that String to the output stream 
(including compression). We should be able to hold the Read Lock only while 
building the appropriate DOM object and then perform the 
transformation/serialization outside of the lock.
 * Template Serialization is inefficient. Currently, for each template, we 
serialize the DTO object to a String, then Deserialize that String into a DOM 
object (all of this is done in order to avoid XML-based injection attacks). We 
then add that DOM object into our flow's DOM object. We should instead hold 
onto/cache that DOM object so that we can cut out all of the above for all but 
the first iteration.
 * ReflectionUtils is used when a Processor is created in order to call any 
method annotated with @OnAdded. The implementation uses some Spring-based 
reflection utils in order to find any sort of Bridged methods. Doing this is 
expensive (on the order of 1 ms on my laptop). While this may not sound like a 
concern, that means that importing a template consisting of 5,000 processors 
will take 5 seconds just to find annotated methods. All within the context of a 
web request. Since these methods will not change, we should instead cache a 
list of Methods that contain the annotations so that we don't have to 
constantly look these up.
 * Authorization uses InovcationHandlers. These InvocationHandlers use 
reflection to compare the method being called to a well-known method. The call 
to Method.equals() is not expensive. However, the call to Class.getMethod() is 
expensive and is done for every single authorization check, which can amount to 
a significant amount of time being spent. Instead, we can store the method of 
interest in a member variable and reference that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (NIFI-5112) Inefficiency in replicating requests across cluster

Reply via email to