[
https://issues.apache.org/jira/browse/NIFI-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448742#comment-16448742
]
Otto Fowler commented on NIFI-5112:
-----------------------------------
Do you have any reports that an show the problem and be used to compare the
result and measure improvement?
> Inefficiency in replicating requests across cluster
> ---------------------------------------------------
>
> Key: NIFI-5112
> URL: https://issues.apache.org/jira/browse/NIFI-5112
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
>
> When replicating requests across the cluster, we do some things that are
> rather inefficient, which can cause the UI to feel sluggish. Because all of
> this is done while the UI awaits a response, we need to ensure that this area
> of the application is very responsive. Through profiling and code review, I
> have identified the following places where we can improve our efficiency:
> * Use of Jersey Client. Jersey Client provides a very easy-to-use API that
> is very powerful. It provides a lot of capabilities to scan class paths and
> automatically detect interceptors, etc. However, doing this comes at a cost.
> Profiling shows that, on average, on my laptop replicating a single request
> took about 100 milliseconds, 100% of which was spent actually constructing
> the Jersey objects. Less than 1 millisecond of time was spent writing the
> message to the socket, awaiting the reply, and parsing the response. By using
> a different client, we can significantly improve this.
> * Flow Serialization holds a Flow Controller Read lock for the entire
> duration. This means that we block any mutable operations, such as HTTP GET
> requests, while we build the appropriate DOM object for the flow, transform
> that DOM object into a String, and write that String to the output stream
> (including compression). We should be able to hold the Read Lock only while
> building the appropriate DOM object and then perform the
> transformation/serialization outside of the lock.
> * Template Serialization is inefficient. Currently, for each template, we
> serialize the DTO object to a String, then Deserialize that String into a DOM
> object (all of this is done in order to avoid XML-based injection attacks).
> We then add that DOM object into our flow's DOM object. We should instead
> hold onto/cache that DOM object so that we can cut out all of the above for
> all but the first iteration.
> * ReflectionUtils is used when a Processor is created in order to call any
> method annotated with @OnAdded. The implementation uses some Spring-based
> reflection utils in order to find any sort of Bridged methods. Doing this is
> expensive (on the order of 1 ms on my laptop). While this may not sound like
> a concern, that means that importing a template consisting of 5,000
> processors will take 5 seconds just to find annotated methods. All within the
> context of a web request. Since these methods will not change, we should
> instead cache a list of Methods that contain the annotations so that we don't
> have to constantly look these up.
> * Authorization uses InovcationHandlers. These InvocationHandlers use
> reflection to compare the method being called to a well-known method. The
> call to Method.equals() is not expensive. However, the call to
> Class.getMethod() is expensive and is done for every single authorization
> check, which can amount to a significant amount of time being spent. Instead,
> we can store the method of interest in a member variable and reference that.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)