[
https://issues.apache.org/jira/browse/NIFI-12205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775306#comment-17775306
]
ASF subversion and git services commented on NIFI-12205:
--------------------------------------------------------
Commit cbdf32ab79fe2041a167bee9abf69f68d91a3be6 in nifi's branch
refs/heads/main from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=cbdf32ab79 ]
NIFI-12205: Moved loading of Python dependencies into background thread when
processor created instead of during startup. Some code cleanup.
This closes #7863
Signed-off-by: David Handermann <[email protected]>
> Improve Python processor startup/loading process
> ------------------------------------------------
>
> Key: NIFI-12205
> URL: https://issues.apache.org/jira/browse/NIFI-12205
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
> Fix For: 2.latest
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> At present, whenever NiFi starts, it detects all Python Processors in the
> configured directories. It then uses {{pip}} to install all third-party
> dependencies. This is done on startup because when a Python Processor is
> created, those dependencies need to be available in order to parse/load the
> Python module.
> However, this can very quickly cause NiFi to take several minutes on startup
> if there are many Python Processors with complex dependencies. Additionally,
> each time that a Processor is created, it can be very slow as it loads the
> Python module. This can also take a lot of space in the {{work/}} directory,
> as well, because it downloads all dependencies, regardless of whether or not
> the Processor is used in the flow.
> We need to refactor this such that on startup, NiFi detects which Processors
> are available but does not load their dependencies. Instead, when a Processor
> is created, if its dependencies have not yet been loaded, they should be
> loaded at that time. However, this cannot happen in the Thread that creates
> the Processor, as it would cause web requests to timeout and eventually
> result in cluster instability.
> Instead, we need to kick off a background thread that is responsible for
> downloading any third-party dependencies and loading the Python module. Until
> all of that happens, the Processor should be considered Invalid. This allows
> the user to see the in the UI that the Processor is not yet ready for use,
> and the Validation Result's explanation should explain why the Processor is
> invalid (e.g., Downloading third-party dependencies, failed to load Python
> code, etc.)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)