briansolo1985 commented on PR #8963:
URL: https://github.com/apache/nifi/pull/8963#issuecomment-2175926065
Thanks for the feedback @exceptionfactory.
I was aware of that limitation. The current solution works for all of the
Python processors currently present in the NiFi codebase, except for
ChunkDocument LANGUAGE property's allowable values where None is returned.
My feeling was that it is somewhat allowed / tolerated in the Python
processor world, as the comment says on the related Java interface:
```
/**
* A list of property descriptions for the known properties. Note that
unlikely Java Processors, Python-based Processors are
* more dynamic, and the properties may not all be discoverable
*
* @return a list of descriptions for known properties
*/
List<PropertyDescription> getPropertyDescriptions();
```
Regarding a more generic solution, it might be more complex and demanding.
I'm not a Python expert, and maybe there are already answers, but I came
across the following difficulties:
* Retrieving values with AST gets complicated when the assignment is an
ast.Call (eg object creation) or an ast.Name (eg assignment by name). For the
former we need to implement the creation logic for each object type, for the
latter we need to create a lookup map containing the name and the references to
get the actual object not just the name. Creating the lookup map has the same
difficulties
* Python is super permissive - as the Javadoc comment says correctly. For
example we can define constants outside and inside classes. Another example is
the syntax.
Given we defined a class:
```
class Descriptor():
def __init__(self, name, description):
self.name = name
self.description = description
```
We can instantiate the object in three different ways:
```
Descriptor("name1", "desc1")
Descriptor("name2", description="desc2")
Descriptor(name="name3", description="desc3")
```
From AST perspective these are all different cases, which had to be handled
from code. Usually classes are more complex which will result in a rather
complex code, or specific to each class.
* Evaluating functions: assignments can be more complex than assigning a
constant value or instantiating new object. For example it can be a list
comprehension just like in ChunkDocument. Processing expression like this is
possible with AST, although it's rather complex, and hard to make it generic.
* Definitions spanning across multiple Python modules. An AST tree is built
for single Python module, and does aware of it's dependencies. Although it
seems to be possible to process import statement while traversing the AST tree,
it will again lead to complex and hard to maintain codebase.
A bigger problem is: at manifest build time not all Python dependencies are
guaranteed to be there, as Python processors are async loaded, and are
downloading their dependencies after the processor being started as part of a
flow. Instead a manifest is created during startup, where a flow is not
necessarily present.
In summary my intention was to provide a good enough solution to make
"allowable values" present in the manifest, while considering the current
limitations we have.
For the dependency download issue
https://issues.apache.org/jira/browse/NIFI-12959 can be an answer. If the
dependencies are self contained, it should be possible to also contain a
extension-manifest.xml in the NAR which could be the just picked up by
StandardRuntimeManifestService, making the whole parsing logic unnecessary.
Still, for Python processors without self encapsulated NARs would require a
way to provide information for the manifest.
Besides "allowable values", "relationships" and "property dependencies" are
also missing function
Given the above reasons, I'm not confident that AST is the best tool to
achieve this.
What do you think? Is there already a plan or a suggested approach here? I'm
open to any suggestion / solutions, and happy to participate in implementing
them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]