[ 
https://issues.apache.org/jira/browse/NIFI-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448742#comment-16448742
 ] 

Otto Fowler commented on NIFI-5112:
-----------------------------------

Do you have any reports that an show the problem and be used to compare the 
result and measure improvement?

> Inefficiency in replicating requests across cluster
> ---------------------------------------------------
>
>                 Key: NIFI-5112
>                 URL: https://issues.apache.org/jira/browse/NIFI-5112
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>
> When replicating requests across the cluster, we do some things that are 
> rather inefficient, which can cause the UI to feel sluggish. Because all of 
> this is done while the UI awaits a response, we need to ensure that this area 
> of the application is very responsive. Through profiling and code review, I 
> have identified the following places where we can improve our efficiency:
>  * Use of Jersey Client. Jersey Client provides a very easy-to-use API that 
> is very powerful. It provides a lot of capabilities to scan class paths and 
> automatically detect interceptors, etc. However, doing this comes at a cost. 
> Profiling shows that, on average, on my laptop replicating a single request 
> took about 100 milliseconds, 100% of which was spent actually constructing 
> the Jersey objects. Less than 1 millisecond of time was spent writing the 
> message to the socket, awaiting the reply, and parsing the response. By using 
> a different client, we can significantly improve this.
>  * Flow Serialization holds a Flow Controller Read lock for the entire 
> duration. This means that we block any mutable operations, such as HTTP GET 
> requests, while we build the appropriate DOM object for the flow, transform 
> that DOM object into a String, and write that String to the output stream 
> (including compression). We should be able to hold the Read Lock only while 
> building the appropriate DOM object and then perform the 
> transformation/serialization outside of the lock.
>  * Template Serialization is inefficient. Currently, for each template, we 
> serialize the DTO object to a String, then Deserialize that String into a DOM 
> object (all of this is done in order to avoid XML-based injection attacks). 
> We then add that DOM object into our flow's DOM object. We should instead 
> hold onto/cache that DOM object so that we can cut out all of the above for 
> all but the first iteration.
>  * ReflectionUtils is used when a Processor is created in order to call any 
> method annotated with @OnAdded. The implementation uses some Spring-based 
> reflection utils in order to find any sort of Bridged methods. Doing this is 
> expensive (on the order of 1 ms on my laptop). While this may not sound like 
> a concern, that means that importing a template consisting of 5,000 
> processors will take 5 seconds just to find annotated methods. All within the 
> context of a web request. Since these methods will not change, we should 
> instead cache a list of Methods that contain the annotations so that we don't 
> have to constantly look these up.
>  * Authorization uses InovcationHandlers. These InvocationHandlers use 
> reflection to compare the method being called to a well-known method. The 
> call to Method.equals() is not expensive. However, the call to 
> Class.getMethod() is expensive and is done for every single authorization 
> check, which can amount to a significant amount of time being spent. Instead, 
> we can store the method of interest in a member variable and reference that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to