Re: [PR] MINIFICPP-2276 Support FlowFileTransform NiFi Python processors [nifi-minifi-cpp]

via GitHub Thu, 22 Feb 2024 07:28:58 -0800


fgerlits commented on code in PR #1712:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1712#discussion_r1497594578



##########
extensions/python/PYTHON.md:
##########
@@ -106,20 +127,39 @@ class VaderSentiment(object):
 To enable python Processor capabilities, the following options need to be 
provided in minifi.properties. The directory specified
 can contain processors. Note that the processor name will be the reference in 
your flow. Directories are treated like package names.
 Therefore if the nifi.python.processor.dir is /tmp/ and you have a 
subdirectory named packagedir with the file name file.py, it will
-produce a processor with the name 
org.apache.nifi.minifi.processors.packagedir.file. Note that each subdirectory 
will append a package 
-to the reference class name. 
+produce a processor with the name 
org.apache.nifi.minifi.processors.packagedir.file. Note that each subdirectory 
will append a package
+to the reference class name.
 
     in minifi.properties
        #directory where processors exist
        nifi.python.processor.dir=XXXX
-       
-       
+
+
 ## Processors
 The python directory (extensions/pythonprocessors) contains implementations 
that will be available for flows if the required dependencies
 exist.
-   
-## Sentiment Analysis
+
+### Sentiment Analysis
 
 The SentimentAnalysis processor will perform a Vader Sentiment Analysis. This 
requires that you install nltk and VaderSentiment
                pip install nltk
                pip install VaderSentiment
+
+## Using NiFi Python Processors
+
+MiNiFi C++ supports the use of NiFi Python processors, that are inherited from 
the FlowFileTransform base class. To use these processors, you must copy the 
Python processor module under the nifi_python_processors directory located 
under the python directory (by default it can be found at 
${minifi_root}/minifi-python/nifi_python_processors). To see how to write NiFi 
Python processors, please refer to the Python Developer Guide under the [Apache 
NiFi documentation](https://nifi.apache.org/documentation/v2/).
+
+In the flow configuration these Python processors can be referenced by their 
fully qualified class name, which looks like this: 
org.apache.nifi.minifi.processors.nifi_python_processors.<package_name>.<processor_name>.
 For example, the fully qualified class name of the PromptChatGPT processor 
implemented in the file nifi_python_processors/PromptChatGPT.py is 
org.apache.nifi.minifi.processors.nifi_python_processors.PromptChatGPT. If a 
processor is copied under a subdirectory, because it is part of a python 
submodule, the submodule name will be appended to the fully qualified class 
name. For example, if the QueryPinecone processor is implemented in the 
QueryPinecone.py file that is copied to 
nifi_python_processors/vectorstores/QueryPinecone.py, the fully qualified class 
name will be 
org.apache.nifi.minifi.processors.nifi_python_processors.vectorstores.QueryPinecone
 in the configuration file.
+
+**NOTE:** The name of the NiFi Python processor file should match the class 
name in the file, otherwise the processor will not be found.
+
+Due to some differences between the NiFi and MiNiFi C++ processors and 
implementation, there are some limitations using the NiFi Python processors:
+- Record based processors are not yet supported in MiNiFi C++, so the NiFi 
Python processors inherited from RecordTransform are not supported.
+- Virtualenv support is not yet available in MiNiFi C++, so all required 
packaged must be installed on the system.

Review Comment:
   ```suggestion
   - Virtualenv support is not yet available in MiNiFi C++, so all required 
packages must be installed on the system.
   ```



##########
extensions/python/ExecutePythonProcessor.cpp:
##########
@@ -44,8 +45,7 @@ void ExecutePythonProcessor::initialize() {
 
   try {
     loadScript();
-  } catch(const std::runtime_error&) {
-    logger_->log_warn("Could not load python script while initializing. In 
case of non-native python processor this is normal and will be done in the 
schedule phase.");

Review Comment:
   Is this log not useful at all?  Maybe we could change it to a lower level 
(info or debug) instead of removing it.



##########
extensions/python/types/Types.h:
##########
@@ -160,7 +160,18 @@ class Long : public ReferenceHolder<reference_type> {
   }
 
   int64_t asInt64() {
-    return static_cast<int64_t>(PyLong_AsLongLong(this->ref_.get()));
+    auto long_value = PyLong_AsLongLong(this->ref_.get());
+    if (long_value == -1 && PyErr_Occurred()) {
+      throw PyException();
+    }
+    return static_cast<int64_t>(long_value);

Review Comment:
   I may be overly paranoid, but I'd prefer `gsl::narrow` instead of 
`static_cast`.



##########
extensions/python/PYTHON.md:
##########
@@ -106,20 +127,39 @@ class VaderSentiment(object):
 To enable python Processor capabilities, the following options need to be 
provided in minifi.properties. The directory specified
 can contain processors. Note that the processor name will be the reference in 
your flow. Directories are treated like package names.
 Therefore if the nifi.python.processor.dir is /tmp/ and you have a 
subdirectory named packagedir with the file name file.py, it will
-produce a processor with the name 
org.apache.nifi.minifi.processors.packagedir.file. Note that each subdirectory 
will append a package 
-to the reference class name. 
+produce a processor with the name 
org.apache.nifi.minifi.processors.packagedir.file. Note that each subdirectory 
will append a package
+to the reference class name.
 
     in minifi.properties
        #directory where processors exist
        nifi.python.processor.dir=XXXX
-       
-       
+
+
 ## Processors
 The python directory (extensions/pythonprocessors) contains implementations 
that will be available for flows if the required dependencies
 exist.
-   
-## Sentiment Analysis
+
+### Sentiment Analysis
 
 The SentimentAnalysis processor will perform a Vader Sentiment Analysis. This 
requires that you install nltk and VaderSentiment
                pip install nltk
                pip install VaderSentiment
+
+## Using NiFi Python Processors
+
+MiNiFi C++ supports the use of NiFi Python processors, that are inherited from 
the FlowFileTransform base class. To use these processors, you must copy the 
Python processor module under the nifi_python_processors directory located 
under the python directory (by default it can be found at 
${minifi_root}/minifi-python/nifi_python_processors). To see how to write NiFi 
Python processors, please refer to the Python Developer Guide under the [Apache 
NiFi documentation](https://nifi.apache.org/documentation/v2/).

Review Comment:
   I think this would be clearer:
   ```suggestion
   MiNiFi C++ supports the use of NiFi Python processors, that are inherited 
from the FlowFileTransform base class. To use these processors, copy the Python 
processor module to the nifi_python_processors subdirectory of the python 
directory. By default, the python directory is ${minifi_root}/minifi-python. To 
see how to write NiFi Python processors, please refer to the Python Developer 
Guide under the [Apache NiFi 
documentation](https://nifi.apache.org/documentation/v2/).
   ```



##########
extensions/python/PythonScriptEngine.cpp:
##########
@@ -180,4 +198,66 @@ void PythonScriptEngine::evaluateModuleImports() {
   }
 }
 
+void PythonScriptEngine::initializeProcessorObject(const std::string& 
python_class_name) {
+  GlobalInterpreterLock gil;
+  if (auto python_class = bindings_[python_class_name]) {
+    auto num_args = [&]() -> size_t {
+      auto class_init = 
OwnedObject(PyObject_GetAttrString(python_class->get(), "__init__"));
+      if (!class_init.get()) {
+        return 0;
+      }
+
+      auto inspect_module = OwnedObject(PyImport_ImportModule("inspect"));
+      if (!inspect_module.get()) {
+        return 0;
+      }
+
+      auto inspect_args = 
OwnedObject(PyObject_CallMethod(inspect_module.get(), "getfullargspec", "O", 
class_init.get()));
+      if (!inspect_args.get()) {
+        return 0;
+      }
+
+      auto arg_list = OwnedObject(PyObject_GetAttrString(inspect_args.get(), 
"args"));
+      if (!arg_list.get()) {
+        return 0;
+      }
+
+      return PyList_Size(arg_list.get());
+    }();
+
+    if (num_args > 1) {
+      auto kwargs = OwnedDict::create();
+      auto value = OwnedObject(Py_None);
+      kwargs.put("jvm", value);
+      auto args = OwnedObject(PyTuple_New(0));

Review Comment:
   should this be `PyTuple_New(num_args)`?



##########
extensions/python/ExecutePythonProcessor.cpp:
##########
@@ -146,11 +151,38 @@ std::unique_ptr<PythonScriptEngine> 
ExecutePythonProcessor::createScriptEngine()
   auto engine = std::make_unique<PythonScriptEngine>();
 
   python_logger_ = 
core::logging::LoggerFactory<ExecutePythonProcessor>::getAliasedLogger(getName());
-  engine->initialize(Success, Failure, python_logger_);
+  engine->initialize(Success, Failure, Original, python_logger_);
 
   return engine;
 }
 
+core::Property* ExecutePythonProcessor::findProperty(const std::string& name) 
const {
+  if (auto prop_ptr = core::ConfigurableComponent::findProperty(name)) {
+    return prop_ptr;
+  }
+
+  auto it = ranges::find_if(python_properties_, [&name](const auto& item){
+    return item.getName() == name;
+  });
+  if (it != python_properties_.end()) {
+    return const_cast<core::Property*>(&*it);
+  }
+
+  return nullptr;
+}
+
+std::map<std::string, core::Property> ExecutePythonProcessor::getProperties() 
const {
+  auto result = ConfigurableComponent::getProperties();
+
+  std::lock_guard<std::mutex> lock(configuration_mutex_);

Review Comment:
   Why do we lock this mutex here? It's not clear to me why the mutex is 
`protected` instead of `private`, either.
   
   Elsewhere, we read and write `python_properties_` without any locking. Maybe 
we need a new mutex to protect `python_properties_`?



##########
extensions/python/types/PyProcessContext.cpp:
##########
@@ -65,12 +66,31 @@ PyObject* PyProcessContext::getProperty(PyProcessContext* 
self, PyObject* args)
     return nullptr;
   }
 
-  const char* property;
-  if (!PyArg_ParseTuple(args, "s", &property)) {
+  const char* property_name = nullptr;
+  PyObject* script_flow_file = nullptr;
+  if (!PyArg_ParseTuple(args, "s|O", &property_name, &script_flow_file)) {
     throw PyException();
   }
+
   std::string value;
-  context->getProperty(property, value);
+  if (!script_flow_file) {
+    if (!context->getProperty(property_name, value)) {
+      Py_RETURN_NONE;
+    }
+  } else {
+    auto py_flow = reinterpret_cast<PyScriptFlowFile*>(script_flow_file);
+    const auto flow_file = py_flow->script_flow_file_.lock();
+    if (!flow_file) {
+      PyErr_SetString(PyExc_AttributeError, "tried reading FlowFile outside 
'on_trigger'");
+      return nullptr;

Review Comment:
   What is the difference between `return nullptr` and `Py_RETURN_NONE`?



##########
libminifi/src/core/ConfigurableComponent.cpp:
##########
@@ -36,19 +36,28 @@ ConfigurableComponent::ConfigurableComponent()
 
 ConfigurableComponent::~ConfigurableComponent() = default;
 
+Property* ConfigurableComponent::findProperty(const std::string& name) const {

Review Comment:
   I don't like it that this const function returns a non-cost pointer to its 
internals.  Instead, we could have a virtual
   ```c++
   const Property* ConfigurableComponent::findProperty(const std::string& name) 
const;
   ```
   (without `const_cast`), and have a single `const_cast` in the non-virtual, 
non-const
   ```c++
   Property* ConfigurableComponent::findProperty(const std::string& name) {
     const auto& const_self = *this;
     return const_cast<Property*>(const_self.findProperty(name));
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] MINIFICPP-2276 Support FlowFileTransform NiFi Python processors [nifi-minifi-cpp]

Reply via email to