Mark Payne created NIFI-9382:
--------------------------------
Summary: Improve startup time when loading flow that uses many
HDFS related processors
Key: NIFI-9382
URL: https://issues.apache.org/jira/browse/NIFI-9382
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework, Extensions
Reporter: Mark Payne
Assignee: Mark Payne
When starting NiFI, if a flow has many HDFS related processors (hundreds to
thousands) the startup time can be very long. In one case, I have a user flow
that has > 1000 HDFS processors and it takes 1-2 hours to fully start NiFi.
This is because the HDFS makes a lot of assumptions about the environment that
it's running in. These assumptions are not always true, unfortunately, when
running in NiFi. The use of static methods in the UserGroupInformation class
means that in order to interact with an HDFS cluster using multiple Kerberos
Principals, we have to create ClassLoader isolation, using a separate,
duplicate ClassLoader for each HDFS processor.
Because of this, the HDFS client components must be initialized once for each
processor, and the initialization of the client is very expensive. We need to
improve this so that we don't create a separate ClassLoader that loads hundreds
or thousands of classes for each instance of the Processor.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)