fgerlits commented on code in PR #1712:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1712#discussion_r1497594578
##########
extensions/python/PYTHON.md:
##########
@@ -106,20 +127,39 @@ class VaderSentiment(object):
To enable python Processor capabilities, the following options need to be
provided in minifi.properties. The directory specified
can contain processors. Note that the processor name will be the reference in
your flow. Directories are treated like package names.
Therefore if the nifi.python.processor.dir is /tmp/ and you have a
subdirectory named packagedir with the file name file.py, it will
-produce a processor with the name
org.apache.nifi.minifi.processors.packagedir.file. Note that each subdirectory
will append a package
-to the reference class name.
+produce a processor with the name
org.apache.nifi.minifi.processors.packagedir.file. Note that each subdirectory
will append a package
+to the reference class name.
in minifi.properties
#directory where processors exist
nifi.python.processor.dir=XXXX
-
-
+
+
## Processors
The python directory (extensions/pythonprocessors) contains implementations
that will be available for flows if the required dependencies
exist.
-
-## Sentiment Analysis
+
+### Sentiment Analysis
The SentimentAnalysis processor will perform a Vader Sentiment Analysis. This
requires that you install nltk and VaderSentiment
pip install nltk
pip install VaderSentiment
+
+## Using NiFi Python Processors
+
+MiNiFi C++ supports the use of NiFi Python processors, that are inherited from
the FlowFileTransform base class. To use these processors, you must copy the
Python processor module under the nifi_python_processors directory located
under the python directory (by default it can be found at
${minifi_root}/minifi-python/nifi_python_processors). To see how to write NiFi
Python processors, please refer to the Python Developer Guide under the [Apache
NiFi documentation](https://nifi.apache.org/documentation/v2/).
+
+In the flow configuration these Python processors can be referenced by their
fully qualified class name, which looks like this:
org.apache.nifi.minifi.processors.nifi_python_processors.<package_name>.<processor_name>.
For example, the fully qualified class name of the PromptChatGPT processor
implemented in the file nifi_python_processors/PromptChatGPT.py is
org.apache.nifi.minifi.processors.nifi_python_processors.PromptChatGPT. If a
processor is copied under a subdirectory, because it is part of a python
submodule, the submodule name will be appended to the fully qualified class
name. For example, if the QueryPinecone processor is implemented in the
QueryPinecone.py file that is copied to
nifi_python_processors/vectorstores/QueryPinecone.py, the fully qualified class
name will be
org.apache.nifi.minifi.processors.nifi_python_processors.vectorstores.QueryPinecone
in the configuration file.
+
+**NOTE:** The name of the NiFi Python processor file should match the class
name in the file, otherwise the processor will not be found.
+
+Due to some differences between the NiFi and MiNiFi C++ processors and
implementation, there are some limitations using the NiFi Python processors:
+- Record based processors are not yet supported in MiNiFi C++, so the NiFi
Python processors inherited from RecordTransform are not supported.
+- Virtualenv support is not yet available in MiNiFi C++, so all required
packaged must be installed on the system.
Review Comment:
```suggestion
- Virtualenv support is not yet available in MiNiFi C++, so all required
packages must be installed on the system.
```
##########
extensions/python/ExecutePythonProcessor.cpp:
##########
@@ -44,8 +45,7 @@ void ExecutePythonProcessor::initialize() {
try {
loadScript();
- } catch(const std::runtime_error&) {
- logger_->log_warn("Could not load python script while initializing. In
case of non-native python processor this is normal and will be done in the
schedule phase.");
Review Comment:
Is this log not useful at all? Maybe we could change it to a lower level
(info or debug) instead of removing it.
##########
extensions/python/types/Types.h:
##########
@@ -160,7 +160,18 @@ class Long : public ReferenceHolder<reference_type> {
}
int64_t asInt64() {
- return static_cast<int64_t>(PyLong_AsLongLong(this->ref_.get()));
+ auto long_value = PyLong_AsLongLong(this->ref_.get());
+ if (long_value == -1 && PyErr_Occurred()) {
+ throw PyException();
+ }
+ return static_cast<int64_t>(long_value);
Review Comment:
I may be overly paranoid, but I'd prefer `gsl::narrow` instead of
`static_cast`.
##########
extensions/python/PYTHON.md:
##########
@@ -106,20 +127,39 @@ class VaderSentiment(object):
To enable python Processor capabilities, the following options need to be
provided in minifi.properties. The directory specified
can contain processors. Note that the processor name will be the reference in
your flow. Directories are treated like package names.
Therefore if the nifi.python.processor.dir is /tmp/ and you have a
subdirectory named packagedir with the file name file.py, it will
-produce a processor with the name
org.apache.nifi.minifi.processors.packagedir.file. Note that each subdirectory
will append a package
-to the reference class name.
+produce a processor with the name
org.apache.nifi.minifi.processors.packagedir.file. Note that each subdirectory
will append a package
+to the reference class name.
in minifi.properties
#directory where processors exist
nifi.python.processor.dir=XXXX
-
-
+
+
## Processors
The python directory (extensions/pythonprocessors) contains implementations
that will be available for flows if the required dependencies
exist.
-
-## Sentiment Analysis
+
+### Sentiment Analysis
The SentimentAnalysis processor will perform a Vader Sentiment Analysis. This
requires that you install nltk and VaderSentiment
pip install nltk
pip install VaderSentiment
+
+## Using NiFi Python Processors
+
+MiNiFi C++ supports the use of NiFi Python processors, that are inherited from
the FlowFileTransform base class. To use these processors, you must copy the
Python processor module under the nifi_python_processors directory located
under the python directory (by default it can be found at
${minifi_root}/minifi-python/nifi_python_processors). To see how to write NiFi
Python processors, please refer to the Python Developer Guide under the [Apache
NiFi documentation](https://nifi.apache.org/documentation/v2/).
Review Comment:
I think this would be clearer:
```suggestion
MiNiFi C++ supports the use of NiFi Python processors, that are inherited
from the FlowFileTransform base class. To use these processors, copy the Python
processor module to the nifi_python_processors subdirectory of the python
directory. By default, the python directory is ${minifi_root}/minifi-python. To
see how to write NiFi Python processors, please refer to the Python Developer
Guide under the [Apache NiFi
documentation](https://nifi.apache.org/documentation/v2/).
```
##########
extensions/python/PythonScriptEngine.cpp:
##########
@@ -180,4 +198,66 @@ void PythonScriptEngine::evaluateModuleImports() {
}
}
+void PythonScriptEngine::initializeProcessorObject(const std::string&
python_class_name) {
+ GlobalInterpreterLock gil;
+ if (auto python_class = bindings_[python_class_name]) {
+ auto num_args = [&]() -> size_t {
+ auto class_init =
OwnedObject(PyObject_GetAttrString(python_class->get(), "__init__"));
+ if (!class_init.get()) {
+ return 0;
+ }
+
+ auto inspect_module = OwnedObject(PyImport_ImportModule("inspect"));
+ if (!inspect_module.get()) {
+ return 0;
+ }
+
+ auto inspect_args =
OwnedObject(PyObject_CallMethod(inspect_module.get(), "getfullargspec", "O",
class_init.get()));
+ if (!inspect_args.get()) {
+ return 0;
+ }
+
+ auto arg_list = OwnedObject(PyObject_GetAttrString(inspect_args.get(),
"args"));
+ if (!arg_list.get()) {
+ return 0;
+ }
+
+ return PyList_Size(arg_list.get());
+ }();
+
+ if (num_args > 1) {
+ auto kwargs = OwnedDict::create();
+ auto value = OwnedObject(Py_None);
+ kwargs.put("jvm", value);
+ auto args = OwnedObject(PyTuple_New(0));
Review Comment:
should this be `PyTuple_New(num_args)`?
##########
extensions/python/ExecutePythonProcessor.cpp:
##########
@@ -146,11 +151,38 @@ std::unique_ptr<PythonScriptEngine>
ExecutePythonProcessor::createScriptEngine()
auto engine = std::make_unique<PythonScriptEngine>();
python_logger_ =
core::logging::LoggerFactory<ExecutePythonProcessor>::getAliasedLogger(getName());
- engine->initialize(Success, Failure, python_logger_);
+ engine->initialize(Success, Failure, Original, python_logger_);
return engine;
}
+core::Property* ExecutePythonProcessor::findProperty(const std::string& name)
const {
+ if (auto prop_ptr = core::ConfigurableComponent::findProperty(name)) {
+ return prop_ptr;
+ }
+
+ auto it = ranges::find_if(python_properties_, [&name](const auto& item){
+ return item.getName() == name;
+ });
+ if (it != python_properties_.end()) {
+ return const_cast<core::Property*>(&*it);
+ }
+
+ return nullptr;
+}
+
+std::map<std::string, core::Property> ExecutePythonProcessor::getProperties()
const {
+ auto result = ConfigurableComponent::getProperties();
+
+ std::lock_guard<std::mutex> lock(configuration_mutex_);
Review Comment:
Why do we lock this mutex here? It's not clear to me why the mutex is
`protected` instead of `private`, either.
Elsewhere, we read and write `python_properties_` without any locking. Maybe
we need a new mutex to protect `python_properties_`?
##########
extensions/python/types/PyProcessContext.cpp:
##########
@@ -65,12 +66,31 @@ PyObject* PyProcessContext::getProperty(PyProcessContext*
self, PyObject* args)
return nullptr;
}
- const char* property;
- if (!PyArg_ParseTuple(args, "s", &property)) {
+ const char* property_name = nullptr;
+ PyObject* script_flow_file = nullptr;
+ if (!PyArg_ParseTuple(args, "s|O", &property_name, &script_flow_file)) {
throw PyException();
}
+
std::string value;
- context->getProperty(property, value);
+ if (!script_flow_file) {
+ if (!context->getProperty(property_name, value)) {
+ Py_RETURN_NONE;
+ }
+ } else {
+ auto py_flow = reinterpret_cast<PyScriptFlowFile*>(script_flow_file);
+ const auto flow_file = py_flow->script_flow_file_.lock();
+ if (!flow_file) {
+ PyErr_SetString(PyExc_AttributeError, "tried reading FlowFile outside
'on_trigger'");
+ return nullptr;
Review Comment:
What is the difference between `return nullptr` and `Py_RETURN_NONE`?
##########
libminifi/src/core/ConfigurableComponent.cpp:
##########
@@ -36,19 +36,28 @@ ConfigurableComponent::ConfigurableComponent()
ConfigurableComponent::~ConfigurableComponent() = default;
+Property* ConfigurableComponent::findProperty(const std::string& name) const {
Review Comment:
I don't like it that this const function returns a non-cost pointer to its
internals. Instead, we could have a virtual
```c++
const Property* ConfigurableComponent::findProperty(const std::string& name)
const;
```
(without `const_cast`), and have a single `const_cast` in the non-virtual,
non-const
```c++
Property* ConfigurableComponent::findProperty(const std::string& name) {
const auto& const_self = *this;
return const_cast<Property*>(const_self.findProperty(name));
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]