[ 
https://issues.apache.org/jira/browse/NIFI-12205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775306#comment-17775306
 ] 

ASF subversion and git services commented on NIFI-12205:
--------------------------------------------------------

Commit cbdf32ab79fe2041a167bee9abf69f68d91a3be6 in nifi's branch 
refs/heads/main from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=cbdf32ab79 ]

NIFI-12205: Moved loading of Python dependencies into background thread when 
processor created instead of during startup. Some code cleanup.

This closes #7863

Signed-off-by: David Handermann <[email protected]>


> Improve Python processor startup/loading process
> ------------------------------------------------
>
>                 Key: NIFI-12205
>                 URL: https://issues.apache.org/jira/browse/NIFI-12205
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>             Fix For: 2.latest
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> At present, whenever NiFi starts, it detects all Python Processors in the 
> configured directories. It then uses {{pip}} to install all third-party 
> dependencies. This is done on startup because when a Python Processor is 
> created, those dependencies need to be available in order to parse/load the 
> Python module.
> However, this can very quickly cause NiFi to take several minutes on startup 
> if there are many Python Processors with complex dependencies. Additionally, 
> each time that a Processor is created, it can be very slow as it loads the 
> Python module. This can also take a lot of space in the {{work/}} directory, 
> as well, because it downloads all dependencies, regardless of whether or not 
> the Processor is used in the flow.
> We need to refactor this such that on startup, NiFi detects which Processors 
> are available but does not load their dependencies. Instead, when a Processor 
> is created, if its dependencies have not yet been loaded, they should be 
> loaded at that time. However, this cannot happen in the Thread that creates 
> the Processor, as it would cause web requests to timeout and eventually 
> result in cluster instability.
> Instead, we need to kick off a background thread that is responsible for 
> downloading any third-party dependencies and loading the Python module. Until 
> all of that happens, the Processor should be considered Invalid. This allows 
> the user to see the in the UI that the Processor is not yet ready for use, 
> and the Validation Result's explanation should explain why the Processor is 
> invalid (e.g., Downloading third-party dependencies, failed to load Python 
> code, etc.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to