markap14 commented on code in PR #7003: URL: https://github.com/apache/nifi/pull/7003#discussion_r1142707556
########## nifi-docs/src/main/asciidoc/python-developer-guide.adoc: ########## @@ -0,0 +1,543 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. +// += NiFi Python Developer's Guide +Apache NiFi Team <d...@nifi.apache.org> +:homepage: http://nifi.apache.org +:linkattrs: + + +== Introduction + +This guide is intended to provide an introduction and some guidance to developing extensions for Apache NiFi using Python. +This guide is not intended to be an alternative to the link:developer-guide.adoc[NiFi Developers Guide] document but rather +a supplement to it. The normal Developer Guide is far more in depth and discusses more topics. However, that guide is +targeted toward Java developers. The philosophies and guidance offered in that guide, generally, still hold for Python extensions, though. + +[[java_python_comms]] +== Java/Python Communication + +While NiFi is a Java based application, we do allow for native Python based processors. In order for this to work, it is essential +that both the Java and Python processes be able to communicate with one another. To facilitate this, when a Python process is launched, +a server is started on both the Java and Python sides. This server is started in such a way that it listens only on local network interfaces. +That is, it is not possible to connect to either the Java or Python server from another machine. Connections must be made from localhost. +This provides an important security layer. + +There are objects on the Java side that must be made available to the Python side. Likewise, the Python side must return information to the Java +side. For example, the Java application is responsible for storing the flow definition, such as the fact that some Processor exists, the configuration +of that Processor, etc. It's also responsible for maintaining the FlowFiles and their data. This information must be conveyed over the socket from +the Java side to the Python side. Once a Python Processor performs its task and wants to route a given FlowFile to some relationship, this information +must also be conveyed back to the Java side. + +Fortunately, the NiFi API handles all of this and makes this seamless. This is handled by means of object proxies. + +=== Object Proxies + +Any time that a Java object must be made available to the Python API, it is made available via a proxy object. This means that in order to access a Java +object from Python, we need to simply call the appropriate method on the Python proxy. When this method is called, a message is generated by the Python +object and sent over the socket. That message is essentially an encoding of "Invoke method ABC on object XYZ, using arguments W, X, and Y." + +For example, if we have an `InputFlowFile` object named `flowFile` and we want the `filename` attribute, we can do so by calling: +---- +filename = flowFile.getAttribute('filename') +---- + +From the Python API perspective, this is all that is necessary. Behind the scenes, a message is written to the local socket that is an encoding of the +message "Invoke the getAttribute method on the object with ID 679212, with String argument 'filename'". +The Java process then receives this command, invokes the specified method on the object with the given identifier, and writes back to the socket the result +of that method invocation. As a result, the Python side receives the value of the "filename" attribute. + +This is important to understand, because it means that any method invocation that occurs on a Java object must be serialized, written over the socket, +deserialized, and then the method can be invoked. The result must then be serialized, written over the socket, and the result deserialized on the Python side. +While this is a fairly efficient process, it is not nearly as efficient as simply invoking a method natively. As a result, it is important to consider the overhead of +method invocations on Java objects. + +=== Object age-off + +It is also important to understand that any time that an object is provided as an argument to a Python Processor, that object can only be accessed on the Python +side as long as the object is made available on the Java side. Because the java side cannot store all objects indefinitely, some cleanup must happen. This cleanup +happens immediately after the method is invoked. + +That means that if the `transform` of a `FlowFileTransform` is called with a `ProcessContext` object, that object is available for use ONLY during the +method invocation. As soon as the method returns (successfully or not), the object will no longer be available for use. As a result, objects provided to method +invocations should not be stored for later use, such as assigning a value to `self.processContext`. + +Referencing an object that is no longer accessible will result in an error similar to: + +---- +py4j.Py4JException: An exception was raised by the Python Proxy. Return Message: Traceback (most recent call last): + File "/Users/mpayne/devel/nifi/nifi-assembly/target/nifi-2.0.0-SNAPSHOT-bin/nifi-2.0.0-SNAPSHOT/python/framework/py4j/java_gateway.py", line 2466, in _call_proxy + return_value = getattr(self.pool[obj_id], method)(*params) + File "/Users/mpayne/devel/nifi/nifi-assembly/target/nifi-2.0.0-SNAPSHOT-bin/nifi-2.0.0-SNAPSHOT/./python/extensions/SetRecordField.py", line 22, in transform + <Your Line of Python Code> + File "/Users/mpayne/devel/nifi/nifi-assembly/target/nifi-2.0.0-SNAPSHOT-bin/nifi-2.0.0-SNAPSHOT/python/framework/py4j/java_gateway.py", line 1460, in __str__ + return self.toString() + File "/Users/mpayne/devel/nifi/nifi-assembly/target/nifi-2.0.0-SNAPSHOT-bin/nifi-2.0.0-SNAPSHOT/python/framework/py4j/java_gateway.py", line 1322, in __call__ + return_value = get_return_value( + File "/Users/mpayne/devel/nifi/nifi-assembly/target/nifi-2.0.0-SNAPSHOT-bin/nifi-2.0.0-SNAPSHOT/python/framework/py4j/protocol.py", line 330, in get_return_value + raise Py4JError( +py4j.protocol.Py4JError: An error occurred while calling o15380.toString. Trace: +py4j.Py4JException: Target Object ID does not exist for this gateway :o15380 + at py4j.Gateway.invoke(Gateway.java:279) + at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) + at py4j.commands.CallCommand.execute(CallCommand.java:79) + at py4j.GatewayConnection.run(GatewayConnection.java:238) + at org.apache.nifi.py4j.server.NiFiGatewayConnection.run(NiFiGatewayConnection.java:91) + at java.lang.Thread.run(Thread.java:750) +---- + +This indication "Target Object ID does not exist for this gateway..." indicates that the Python code is attempting to access a Java object +that is no longer accessible. + + +[[processor_api]] +== Processor API + +In the initial release of the feature that makes Python a first class citizen for extensions, we will focus purely on Processors. +Initially, there will be no ability to develop Controller Services in Python, though Python based Processors may make use of +existing Controller Services. + +The Processor API that exists for Java is more general and less prescriptive than the Python counterpart. +The Java API allows for a very wide array of possibilities in terms of the +types of components that may be built. The Python API, on the other hand, is more narrowly scoped and prescriptive. +There are many reasons for this: + + - It is easier to encourage best practices for components with a more narrowly focused API. + - Most of the use cases in which we see a need for Python-based extension points (based on previous use of scripting +Processors such as ExecuteScript) tend to be around data manipulation and/or complex evaluation of data. + - More narrowly focused APIs result in code that requires less boilerplate. + - Calls from Python to Java (and vice versa) are far more expensive than native method calls. Having APIs that are more tailored toward +specific use cases allows for fewer interactions between the two processes, which greatly improves performance. + +As a result, the Python API consists of two different Processor classes that can be implemented: `FlowFileTransform` and `RecordTransform`. +Others may emerge in the future. + + + +[[flowfile-transform]] +=== FlowFileTransform + +The `FlowFileTransform` API provides a mechanism for routing and transforming a FlowFile based on its attributes as well as its +textual or binary contents. Contrast this with the `RecordTransform` API, which provides a mechanism for routing and transforming +individual Records (such as JSON, Avro or CSV Records, for example). + +In order to implement the `FlowFileTransform` API, a Python class must extend from the `nifiapi.FlowFileTransform` class +and implement the `transform(ProcessContext, InputFlowFile)` method, which returns a `FlowFileTransformResult`. + +Additionally, the Processor class must provide two pieces of information as subclasses: the Java interface that it implements +(which will always be `org.apache.nifi.python.processor.FlowFileTransform`) and any details about the Processor, such as the +version, a description, keywords/tags that might be associated with the Processor, etc. +These will be discussed in more details below, in the <<inner-classes>> section. + +As such, a simple implementation may look like this: +---- +from nifiapi.flowfiletransform import FlowFileTransform, FlowFileTransformResult + +class WriteHelloWorld(FlowFileTransform): + class Java: + implements = ['org.apache.nifi.python.processor.FlowFileTransform'] + class ProcessorDetails: + version = '0.0.1-SNAPSHOT' + + def __init__(self, **kwargs): + super().__init__(**kwargs) + + def transform(self, context, flowfile): + return FlowFileTransformResult(relationship = "success", contents = "Hello World", attributes = {"greeting", "hello"}) +---- + +The `transform` method is expected to take two arguments: the context (of type `nifiapi.properties.ProcessContext`) and +the flowfile (of type `InputFlowFile`). + +The return type is a `FlowFileTransformResult` that indicates which Relationship the FlowFile should be transferred to, +the updated contents of the FlowFile, and any attributes that should be added to the FlowFile (or overwritten). The +`relationship` is a required argument. The `contents` is optional. If the contents of the FlowFile are not to be updated, +the `contents` should be unspecified or should be specified as `None`. The original FlowFile contents should not be returned, +as it will have the same effect as passing `None` but will be more expensive, as the contents will be written out to the FlowFile. +Likewise, it is more efficient to omit the `attributes` unless there is any attribute to add. + + +[[process-context]] +==== context + +The `context` parameter is of type `nifiapi.properties.ProcessContext`. This class can be used to determine configuration, such as the +Processor's name (via `context.getName()`) and property values (via the `context.getProperties()` and `context.getProperty(String propertyName)`) +methods. + +Note that the `getProperty(String)` method does not return a String representation of the configured value but rather a `PythonPropertyValue` object. +This allows for the property's value to be interpreted in different ways. For example, `PythonPropertyValue.getValue()` returns the String representation +of the value. `PythonPropertyValue.asInteger()` returns `None` or an integer representation of the value. + +`PythonPropertyValue.getTimePeriod( nifiapi.properties.TimeUnit )` can be used to retrieve the configured value as some time period. +For example, if a property named "timeout" is set to a value of "30 sec" we could use +`context.getProperty("timeout").asTimePeriod(TimeUnit.MILLISECONDS)` and this would return to us a value of `30000`. This allows for a better +user experience than requiring properties to follow a certain convention such as seconds or milliseconds while still allowing you, as a Processor +developer, to easily obtain the value in whatever units make the most sense for the use case. + +The `PythonPropertyValue.asControllerService()` method can be used in order to obtain a Controller Service that can be used by the Processor. + +The `PythonPropertyValue` object also provides the ability to call the `evaluateAttributeExpressions(attributeMap=None)` method. +This can be used to evaluate the configured Expression Language. For example, if a value of `${filename}` is used for a property value, +we can use `context.getProperty("my property").evaluateAttributeExpressions(flowFile).getValue()` in order to evaluate the Expression Language +expression and then get the String representation of the value. + + +==== flowfile + +The FlowFile is a proxy to the Java `InputFlowFile` object. This exposes the following methods: + +`getContentsAsBytes` : returns the contents of the FlowFile as a byte array. This method should be used conservatively, as it it loads entire contents +of the FlowFile into a byte array on the Java side, and then sends a copy to the Python side. As a result, the FlowFile's contents are buffered into memory +twice, once on the Java heap and once in the Python process. + +`getContentsAsReader` : returns a Java `BufferedReader` that can be used to read the contents of the FlowFile one line at a time. While this is only applicable +for textual content, it avoids loading the entire FlowFile's contents into memory. However, each invocation to `BufferedReader.readLine()` does require a call +to Java, so the performance may not compare to that of calling `getContentsAsBytes`. + +`getSize` : returns the number of bytes in the FlowFile's contents. + +`getAttribute(String name)` : returns the value of the FlowFile's attribute with the given name, or `None` if the FlowFile does not have +an attribute with that name. + +`getAttributes()` : returns a Python dictionary whose keys are FlowFile attributes' names and whose values are the associated attribute values. + + + +==== FlowFileTransformResult + +After the Processor has performed its task, the Processor must return an instance of `nifiapi.flowfiletransform.FlowFileTransformResult`. +The constructor has a single required positional argument, the `relationship` to route the FlowFile to. Additionally, if the contents of +the FlowFile are to be updated, the FlowFile's new contents should be returned via the `contents` argument. Any FlowFile attributes that +are to be added or modified may additionally provided using the `attributes` argument. + + + +[[record-transform]] +=== RecordTransform + +While the `FlowFileTransform` API provides the ability to operate on a FlowFile at a time, the `RecordTransform` API provides developers +with the opportunity to operate on a single Record at a time. For example, if a FlowFile is made up of many JSON Records, the `RecordTransform` +Processor can be used to operate on each individual record without worrying about whether the Records are colocated or not. +Implementations of this API must extend from the `RecordTransform` base class and must also implement the following method: + +`def transform(self, context, record, schema, attributemap)` + +returning a `RecordTransformResult` object. + +The `context` object is an implementation of the same `ProcessContext` that is used in the `FlowFileTransform` Processor +(see <<process-context>>). The `record` is a Python dictionary that represents the record to operate on. Regardless of whether +the source of the record is JSON, CSV, Avro, or some other input format, this method is provided a Python dictionary. This makes +it far simpler to operate on the data within Python and means that the code is very portable, as it can operate on any format of +data. + +The associated `schema` object is an instance of a Java object, `org.apache.nifi.record.serialization.RecordSchema`. This provides a +schema for the data. However, calls to the schema must be made over the socket to the Java side and, as such, are expensive. + +Finally, the method signature provides an `attributemap`. This `attributemap` has two methods: + +`getAttribute(String name)` : returns the value of the FlowFile's attribute with the given name, or `None` if the FlowFile does not have +an attribute with that name. + +`getAttributes()` : returns a Python dictionary whose keys are FlowFile attributes' names and whose values are the associated attribute values. + +Note that these two methods are identical to those in the `InputFlowFile` class discussed above. This allows the `attributemap` to be +provided to a `PythonPropertyValue` in order to evaluate Expression Language. For example, we might determine the name of a record's field to use +for some operation by calling: +---- +field_name = context.getProperty("Field Name").evaluateAttributeExpressions(attributemap).getValue() +---- + +Finally, the method must return an instance of `nifiapi.recordtransform.RecordTransformResult`. + +The `RecordTransformResult` constructor takes four optional named arguments: + +`record` : the transformed version of the Record. If the record is not supplied, or if `None` is supplied, the input Record will be +dropped from the output. + +`schema` : the transformed schema. If this is not specified, the schema will be inferred. However, if the schema is specified, the schema +is binding, not the data. So, if a field is missing from the schema, for instance, it will be dropped from the data. And if the schema has a field +in it and there's no corresponding value in the data, the field will be assumed to have a value of `None`. + +`relationship` : the name of the Relationship to route the Record to. If not specified, the value will be routed to the "success" relationship. +However, the implementation may choose to expose relationships other than "success" and "failure" and route records accordingly. For example, +the implementation may want to record a Record to either "valid" or "invalid." + +`partition` : By default, all Records in a given incoming FlowFile will be written to a single output FlowFile (or, more accurately, the transformed version +of the Record will be, assuming that a value of `None` is not returned for the result's `record` field). However, we may want to partition +the incoming data into separate output FlowFiles. For example, we could have incoming data that has a "country" field and want a separate output FlowFile +for each country. In this case, we would return a Python dictionary for the `partition` argument that looks something like `{'country': record['country']}`. +If the partition has more than one field in the dictionary, all fields in the dictionary must be the same value for two Records in order for +the Records to be written to the same output FlowFile. + + + +[[property-descriptors]] +=== PropertyDescriptors + +An important aspect of any software is the ability to configure it. With NiFi, Processors are configured by their properties. +In order to expose what properties are available, a Processor must expose a `PropertyDescriptor` for the property. The `PropertyDescriptor` +contains all of the information necessary in order to convey how to configure the property. + +A `PropertyDescriptor` is created using the `nifiapi.properties.PropertyDescriptor` class. The constructor takes two required positional +arguments: `name` and `description`. All other arguments are optional. + +Typically, a Processor will have multiple Property Descriptors. These descriptors are then returned to the NiFi framework by implementing the following +method in the Processor (regardless of whether it is a `FlowFileTransform` or a `RecordTransform`): +---- +def getPropertyDescriptors(self) +---- + +This method returns a list of PropertyDescriptors. The typical convention is to create the Property Descriptors in the Processor's constructor +and then return them in this method, such as: + +---- +from nifiapi.flowfiletransform import FlowFileTransform +from nifiapi.properties import PropertyDescriptor, StandardValidators + +class PrettyPrintJson(FlowFileTransform): +... + def __init__(self, **kwargs): + super.__init(**kwargs) + + numspaces = PropertyDescriptor(name="Number of Spaces", + description="Number of spaces to use for pretty-printing", + validators=[StandardValidators.POSITIVE_INTEGER_VALIDATOR], + defaultValue="4", + required=True) + self.descriptors = [numspaces] + +... + + def getPropertyDescriptors(self): + return self.descriptors +---- + +There are times, however, that Processor developer wants to allow users to specify their only properties. For example, we may allow users to enter +multiple key/value pairs where the key is the name of a Record field to set and the value is the value to set it to. +To accomplish this, we implement the following method: + +---- +def getDynamicPropertyDescriptor(self, propertyname): +---- +Which returns a PropertyDescriptor. For example: +---- +def getDynamicPropertyDescriptor(self, propertyname): + return PropertyDescriptor(name=propertyname, + description="A user-defined property", + dynamic=True) # dynamic=True is optional and included here only for completeness' sake +---- + +If this method is not implemented and a user adds a property other than those that are explicitly supported, the Processor will become +invalid. Of course, we might also specify explicit validators that can be used, etc. + + + +[[relationships]] +=== Relationships + +Each Processor in NiFi must route its outgoing data to some destination. In NiFi, those destinations are called "Relationships." +Each Processor is responsible for declaring its Relationships. + +Both the FlowFileTransform an RecordTransform Processors already have a Relationship named `original` and one named `failure.` +The `original` relationship should not be used by implementations. This is used only by the framework and allows the input FlowFile +to be passed on without modification. If the Processor cannot transform its input (because the data is not valid, for example), +the Processor may route the data to the `failure` relationship. + +By default, both implementations also have a `success` relationship. However, Processors may override the Relationships that it +defines. It does this by implementing the following method: +---- +def getRelationships(self) +---- +This method returns a list or a set of `nifiapi.relationship.Relationship` objects. If this method is implemented, the `success` +Relationship will not automatically be made available. It will need to be created and returned within this list, if it is to be used. +Regardless of which Relationships are exposed by the implementation, the `failure` and `original` will always be made available. + + +[[inner-classes]] +=== ProcessorDetails and Java inner classes + +As noted above, the `ProcessorDetails` and `Java` inner classes are important to Processors. The `Java` inner class must be defined +on all Processors and must include a member named `implements` that is a list of Java interfaces that the class implements. This is +important, as it allows the Py4J protocol to understand how to interact with this obect from the Java side. + +The `ProcessorDetails` class tells NiFi about the Processor so that it can allow configuration of the Processor seamlessly through the NiFi UI. +Additionally, it provides details about what is necessary in order to use the Processor. +The `ProcessorDetails` class may have several different members: + +`version` : The implementation version of the Processor + +`description` : A description that can be presented in the UI to explain how the Processor is to be used. This may be more than +a single sentence but should be kept as a few sentences, or a short paragraph. + +`tags` : a list of Strings that indicates tags or keywords that are associated with the Processor. When a user adds a Processor to the +NiFi canvas via the UI, users may search for keywords in order to provide discoverability. For example, if a user were to search for +"CSV" any Processor whose name contains the letters "CSV" would should up. Additionally, any Processor that has a "CSV" tag would also show up. + +`dependencies` : A list of Strings that are PyPI dependencies that the Processor depends on. The format of these strings is the same +as would be provided to `pip install`. See <<dependencies>> for more information. + + +[[logging]] +=== Logging in NiFi + +NiFi logging works much the same way as in any other application, with one important difference. NiFi aims to make the +user interface intuitive and informative, and as part of that experience will surface log messages that are appropriate. +In order to accommodate this, Processors should not instantiate their own loggers. Instead, Processors should simply +make use of `self.logger`. This will be injected into the Processor after the Processor has been created. Of course, it can't +be made available before the Processor has been created, so it cannot be accessed from within the constructor. However, it can +be used anywhere else. + + + +[[lifecycle]] +=== Lifecycle Methods + +Often times, it is necessary to create expensive objects and reuse them instead of creating an object once, using it, and throwing it away. +In order to make this simpler to handle, NiFi provides a method named `onScheduled`. This method is optionally implemented in the Processor. +If the method is implemented, it is defined as: +---- +def onScheduled(self, context) +---- +Where `context` is a ProcessContext as described earlier. The method has no return value. +This method is invoked once whenever a Processor is scheduled to run (regardless of whether it's being started due to user input, NiFi restart, etc.). + +Similarly, it is often necessary to tear down resources when they are no longer necessary. This can be accomplished +by implementing the following method: +---- +def onStopped(self, context) +---- +This method is called once whenever the Processor has been stopped and no longer has any active tasks. It is safe to assume +that there are no longer any invocations of the `transform` method running when this method is called. + + + +[[requirements]] +== Requirements + +The Python API requires that Python 3.9+ is available on the machine hosting NiFi. + +Each Processor may have its own list of requirements / dependencies. These are made available to the Processor by creating a separate +environment for each Processor implementation (not for each instance of a Processor on the canvas). PyPI is then used to install these +dependencies in that environment. + + +[[deploying]] +== Deploying a Developed Processor + +Once a Processor has been developed, it can be made available in NiFi by copying the source of the Python extension to the `$NIFI_HOME/python/extensions` directory by default. +The actual directory to look for extensions can be configured in `nifi.properties` via properties that have the prefix `nifi.python.extensions.source.directory.`. +For example, by default, `nifi.python.extensions.source.directory.default` is set to `./python/extensions`. However, additional paths may be added by replacing `default` +in the property name with some other value. + +Any `.py` file found in the directory will be parsed and examined in order to determine whether or not it is a valid NiFi Processor. +In order to be found, the Processor must have a valid parent (`FlowFileTransform` or `RecordTransform`) and must have an inner class named `Java` +with a `implements = ['org.apache.nifi.python.processor.FlowFileTransform']` or `implements = ['org.apache.nifi.python.processor.RecordFileTransform']`. +This will allow NiFi to automatically discover the Processor. + +Note, however, that if the Processor implementation is broken into multiple Python modules, those modules will not be made available by default. In order +to package a Processor along with its modules, the Processor and any related module must be added to a directory that is directly below the Extensions directory. +For example, if the `WriteNumber.py` file contains a NiFi Processor and also depends on the `ProcessorUtil.py` module, the directory structure would look like this: +---- +NIFI_HOME/ + - python/ + - extensions/ + ProcessorA.py + ProcessorB.py + write-number/ + __init__.py + ProcessorUtils.py + WriteNumber.py +---- +By packaging them together in a subdirectory, NiFi knows to expose the modules to one another. However, the ProcessorA module will have no access +to the `ProcessorUtils` module. Only `WriteNumber` will have access to it. + + +[[reloading]] +== Processor Reloading + +Often times, while developing a Processor, the easiest way to verify and modify its behavior is to make small tweaks and re-run +the data. This is possible in NiFi without restarting. Once a Processor has been discovered and loaded, any changes to the Processor's source code will +take effect whenever the Processor is started again (or during certain other events, such as validation, while the Processor is stopped). + +So we can easily update the source code for a Processor, start it, verify the results, stop the Processor, and update again as necessary. +Or, more simply, click "Run Once" to verify the behavior; modify if necessary; and Run Once again. +It is important to note, however, that if the Processor could not be successfully loaded the first time, NiFi may not monitor it for changes. +Therefore, it's important to ensure that the Processor is in a good working state before attempting to load it in NiFi. Otherwise, NiFi will need to be +restarted in order to discover the Processor and load it again. + +Because NiFi allows for multiple extension directories to be deployed, it might be helpful when developing a new extension to add the source directory +where the extension is being developed as a NiFi extension source directory. This allows developers to develop processors using their IDE and allows NiFi +to pickup any changes seamlessly as soon as the Processor is started. + + +[[dependencies]] +== Adding Third-Party Dependencies + +Third-party dependencies are defined for a Processor using the `dependencies` member of the `ProcessorDetails` inner class. +This is a list of Strings that indicate the PyPI modules that the Processor depends on. The format is the same format expected +by PyPI. + +For example, to indicate that a Processor needs `pandas` installed, the implementation might +look like this: +---- +class PandasProcessor(FlowFileTransform): + class Java: + implements = ['org.apache.nifi.python.processor.FlowFileTransform'] + class ProcessorDetails: + version = '0.0.1-SNAPSHOT', + dependencies = ['pandas'] +---- + +However, it is often necessary to declare a specific version of a dependency. And it may also be necessary to define multiple dependencies. +We can do that in this manner: +---- +class PandasProcessor(FlowFileTransform): + class Java: + implements = ['org.apache.nifi.python.processor.FlowFileTransform'] + class ProcessorDetails: + version = '0.0.1-SNAPSHOT', + dependencies = ['pandas', 'numpy==1.20.0'] +---- + +Here, we accept any version of `pandas` (though the latest is preferred), and we require version `1.20.0` of `numpy`. + + +[[dependency-isolation]] +=== Dependency Isolation + +On startup, NiFi will create a separate Python env (pyenv) for each Processor implementation and will use PyPI to install Review Comment: Yes, yes, I did mean that. :) Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org