[ 
https://issues.apache.org/jira/browse/MINIFICPP-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828351#comment-17828351
 ] 

Gábor Gyimesi edited comment on MINIFICPP-2304 at 3/19/24 2:06 PM:
-------------------------------------------------------------------

h1. Python processor initialization cleanup

Issue: All Python processors are initialized multiple times, for the following 
reasons:
1. Initializing for Script File property setting
2. Initializing for registering processor for agent manifest
3. Initilizing when creating processor from flow config. -> This is the only 
initialization that should be done.

The first two initializations are done for all processors, which involves 
evaluating the python processor script for which all the dependencies need to 
be installed.

Goal: Python processors should only be initialized if they are used by the flow 
configuration, this would allow us to only install the dependencies for the 
processors that are actually used in the flow configuration.
h2. Initialization suggested changes
h2. Script File property setting

When the `Script File` property value is set in the `PythonObjectFactory`, the 
processor is initialized to set the supported properties. In this case it 
shouldn't be necessary to evaluate the script file, the initialization should 
only take place when the processor is first used.

Possible solution: We can change this to only set the supported properties in 
this scenario, so the script evaluation won't be run.
h3. Registering processor for agent manifest

When the `PythonCreator` registers the python processor classes, it also 
retrieves the description and the properties of the python processors for the 
manifest using the `ExternalBuildDescription::addExternalComponent` method. 
This requires the eval of the script file and running the describe and 
onInitialize functions of the processor. This must be done for all processors 
to create the manifest for C2.
 - For MiNiFi python processors we need to call the `onInitialize` and 
`describe` method to get the supported properties and the description of the 
processor, this cannot be avoided.
 - For NiFi python processors we can avoid calling the `describe` method by 
using the ast module to get the description from ProcessorDetails class.
 - We cannot really avoid calling the `onInitialize` method for NiFi 
processors, as it is more complex to find and parse the PropertyDescriptor 
objects from the script file.
 

The goal of this ticket is to only initialize the python processors that are in 
use, but this does not seem to be viable, so the initialization will not be 
changed at the moment.


was (Author: lordgamez):
# Python processor initialization cleanup

Issue: All Python processors are initialized multiple times, for the following 
reasons:
1. Initializing for Script File property setting
2. Initializing for registering processor for agent manifest
3. Initilizing when creating processor from flow config. -> This is the only 
initialization that should be done.

The first two initializations are done for all processors, which involves 
evaluating the python processor script for which all the dependencies need to 
be installed.

Goal: Python processors should only be initialized if they are used by the flow 
configuration, this would allow us to only install the dependencies for the 
processors that are actually used in the flow configuration.

## Initialization suggested changes

### Script File property setting

When the `Script File` property value is set in the `PythonObjectFactory`, the 
processor is initialized to set the supported properties. In this case it 
shouldn't be necessary to evaluate the script file, the initialization should 
only take place when the processor is first used.

Possible solution: We can change this to only set the supported properties in 
this scenario, so the script evaluation won't be run.

### Registering processor for agent manifest

When the `PythonCreator` registers the python processor classes, it also 
retrieves the description and the properties of the python processors for the 
manifest using the `ExternalBuildDescription::addExternalComponent` method. 
This requires the eval of the script file and running the describe and 
onInitialize functions of the processor. This must be done for all processors 
to create the manifest for C2.

- For MiNiFi python processors we need to call the `onInitialize` and 
`describe` method to get the supported properties and the description of the 
processor, this cannot be avoided.
- For NiFi python processors we can avoid calling the `describe` method by 
using the ast module to get the description from ProcessorDetails class.
- We cannot really avoid calling the `onInitialize` method for NiFi processors, 
as it is more complex to find and parse the PropertyDescriptor objects from the 
script file.
 
The goal of this ticket is to only initialize the python processors that are in 
use, but this does not seem to be viable, so the initialization will not be 
changed at the moment.

> Clean up Python processor initialization
> ----------------------------------------
>
>                 Key: MINIFICPP-2304
>                 URL: https://issues.apache.org/jira/browse/MINIFICPP-2304
>             Project: Apache NiFi MiNiFi C++
>          Issue Type: Improvement
>            Reporter: Gábor Gyimesi
>            Assignee: Gábor Gyimesi
>            Priority: Major
>
> Python processor initialization should be refactored to be cleaner. We 
> instantiate the Python processors twice:
> We instantiate the Python processors that are used in the configured MiNiFi 
> flow. This is straightforward and not problematic.
> The problem is that before that we also instantiate all python processors in 
> the PythonCreator::registerScriptDescription method for getting the class 
> description for all available python processors for the agent manifest
>  * In this scenario we call the Python processors' initialize method twice:
>  ** Once the PythonObjectFactory::create method calls it to initialize the 
> supported properties to set the ScriptFile property to the path of the Python 
> processor
>  ** After this the PythonCreator::registerScriptDescription also calls it 
> explicitly to load the python processor from the set path
>  ** This should be circumvented to not need double initialization and have a 
> telling warning message in ExecutePythonProcessor::initialize() if the 
> loadScript method fails
>  * We should also find a way to avoid initializing all the Python processors 
> and retrieve the processor data without it. With NiFi Python processors a way 
> for this could be to use the "ast" python module to retrieve the processor 
> details which does not require loading the python module



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to