[
https://issues.apache.org/jira/browse/NIFI-12205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Payne updated NIFI-12205:
------------------------------
Status: Patch Available (was: Open)
> Improve Python processor startup/loading process
> ------------------------------------------------
>
> Key: NIFI-12205
> URL: https://issues.apache.org/jira/browse/NIFI-12205
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
> Fix For: 2.latest
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> At present, whenever NiFi starts, it detects all Python Processors in the
> configured directories. It then uses {{pip}} to install all third-party
> dependencies. This is done on startup because when a Python Processor is
> created, those dependencies need to be available in order to parse/load the
> Python module.
> However, this can very quickly cause NiFi to take several minutes on startup
> if there are many Python Processors with complex dependencies. Additionally,
> each time that a Processor is created, it can be very slow as it loads the
> Python module. This can also take a lot of space in the {{work/}} directory,
> as well, because it downloads all dependencies, regardless of whether or not
> the Processor is used in the flow.
> We need to refactor this such that on startup, NiFi detects which Processors
> are available but does not load their dependencies. Instead, when a Processor
> is created, if its dependencies have not yet been loaded, they should be
> loaded at that time. However, this cannot happen in the Thread that creates
> the Processor, as it would cause web requests to timeout and eventually
> result in cluster instability.
> Instead, we need to kick off a background thread that is responsible for
> downloading any third-party dependencies and loading the Python module. Until
> all of that happens, the Processor should be considered Invalid. This allows
> the user to see the in the UI that the Processor is not yet ready for use,
> and the Validation Result's explanation should explain why the Processor is
> invalid (e.g., Downloading third-party dependencies, failed to load Python
> code, etc.)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)