[
https://issues.apache.org/jira/browse/MINIFICPP-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828351#comment-17828351
]
Gábor Gyimesi commented on MINIFICPP-2304:
------------------------------------------
# Python processor initialization cleanup
Issue: All Python processors are initialized multiple times, for the following
reasons:
1. Initializing for Script File property setting
2. Initializing for registering processor for agent manifest
3. Initilizing when creating processor from flow config. -> This is the only
initialization that should be done.
The first two initializations are done for all processors, which involves
evaluating the python processor script for which all the dependencies need to
be installed.
Goal: Python processors should only be initialized if they are used by the flow
configuration, this would allow us to only install the dependencies for the
processors that are actually used in the flow configuration.
## Initialization suggested changes
### Script File property setting
When the `Script File` property value is set in the `PythonObjectFactory`, the
processor is initialized to set the supported properties. In this case it
shouldn't be necessary to evaluate the script file, the initialization should
only take place when the processor is first used.
Possible solution: We can change this to only set the supported properties in
this scenario, so the script evaluation won't be run.
### Registering processor for agent manifest
When the `PythonCreator` registers the python processor classes, it also
retrieves the description and the properties of the python processors for the
manifest using the `ExternalBuildDescription::addExternalComponent` method.
This requires the eval of the script file and running the describe and
onInitialize functions of the processor. This must be done for all processors
to create the manifest for C2.
- For MiNiFi python processors we need to call the `onInitialize` and
`describe` method to get the supported properties and the description of the
processor, this cannot be avoided.
- For NiFi python processors we can avoid calling the `describe` method by
using the ast module to get the description from ProcessorDetails class.
- We cannot really avoid calling the `onInitialize` method for NiFi processors,
as it is more complex to find and parse the PropertyDescriptor objects from the
script file.
The goal of this ticket is to only initialize the python processors that are in
use, but this does not seem to be viable, so the initialization will not be
changed at the moment.
> Clean up Python processor initialization
> ----------------------------------------
>
> Key: MINIFICPP-2304
> URL: https://issues.apache.org/jira/browse/MINIFICPP-2304
> Project: Apache NiFi MiNiFi C++
> Issue Type: Improvement
> Reporter: Gábor Gyimesi
> Assignee: Gábor Gyimesi
> Priority: Major
>
> Python processor initialization should be refactored to be cleaner. We
> instantiate the Python processors twice:
> We instantiate the Python processors that are used in the configured MiNiFi
> flow. This is straightforward and not problematic.
> The problem is that before that we also instantiate all python processors in
> the PythonCreator::registerScriptDescription method for getting the class
> description for all available python processors for the agent manifest
> * In this scenario we call the Python processors' initialize method twice:
> ** Once the PythonObjectFactory::create method calls it to initialize the
> supported properties to set the ScriptFile property to the path of the Python
> processor
> ** After this the PythonCreator::registerScriptDescription also calls it
> explicitly to load the python processor from the set path
> ** This should be circumvented to not need double initialization and have a
> telling warning message in ExecutePythonProcessor::initialize() if the
> loadScript method fails
> * We should also find a way to avoid initializing all the Python processors
> and retrieve the processor data without it. With NiFi Python processors a way
> for this could be to use the "ast" python module to retrieve the processor
> details which does not require loading the python module
--
This message was sent by Atlassian Jira
(v8.20.10#820010)