[ 
https://issues.apache.org/jira/browse/NIFI-9382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447597#comment-17447597
 ] 

ASF subversion and git services commented on NIFI-9382:
-------------------------------------------------------

Commit 839fbf7d19a428069355d7bf79b8df7fa68b30a3 in nifi's branch 
refs/heads/main from markap14
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=839fbf7 ]

NIFI-9382: Created a new ClassloaderIsolationKey mechanism by which H… (#5533)

* NIFI-9382: Created a new ClassloaderIsolationKey mechanism by which Hadoop 
related processors (and potentially others) can indicate that they need full 
classloaders to be cloned but can share with other instances in certain 
circumstances
- Added system tests

* NIFI-9382: Renamed interface based on review feedback

* NIFI-9382: Removed ReentrantKerberosUser.

> Improve startup time when loading flow that uses many HDFS related processors
> -----------------------------------------------------------------------------
>
>                 Key: NIFI-9382
>                 URL: https://issues.apache.org/jira/browse/NIFI-9382
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework, Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>             Fix For: 1.16.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> When starting NiFI, if a flow has many HDFS related processors (hundreds to 
> thousands) the startup time can be very long. In one case, I have a user flow 
> that has > 1000 HDFS processors and it takes 1-2 hours to fully start NiFi.
> This is because the HDFS makes a lot of assumptions about the environment 
> that it's running in. These assumptions are not always true, unfortunately, 
> when running in NiFi. The use of static methods in the UserGroupInformation 
> class means that in order to interact with an HDFS cluster using multiple 
> Kerberos Principals, we have to create ClassLoader isolation, using a 
> separate, duplicate ClassLoader for each HDFS processor.
> Because of this, the HDFS client components must be initialized once for each 
> processor, and the initialization of the client is very expensive. We need to 
> improve this so that we don't create a separate ClassLoader that loads 
> hundreds or thousands of classes for each instance of the Processor.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to