briansolo1985 commented on PR #8963:
URL: https://github.com/apache/nifi/pull/8963#issuecomment-2175926065

   Thanks for the feedback @exceptionfactory.
   
   I was aware of that limitation. The current solution works for all of the 
Python processors currently present  in the NiFi codebase, except for 
ChunkDocument LANGUAGE property's allowable values where None is returned.
   My feeling was that it is somewhat allowed / tolerated in the Python 
processor world, as the comment says on the related Java interface:
   ```
       /**
        * A list of property descriptions for the known properties. Note that 
unlikely Java Processors, Python-based Processors are
        * more dynamic, and the properties may not all be discoverable
        *
        * @return a list of descriptions for known properties
        */
       List<PropertyDescription> getPropertyDescriptions();
   ```
   
   Regarding a more generic solution, it might be more complex and demanding.
   I'm not a Python expert, and maybe there are already answers, but I came 
across the following difficulties:
   * Retrieving values with AST gets complicated when the assignment is an 
ast.Call (eg object creation) or an ast.Name (eg assignment by name). For the 
former we need to implement the creation logic for each object type, for the 
latter we need to create a lookup map containing the name and the references to 
get the actual object not just the name. Creating the lookup map has the same 
difficulties
   * Python is super permissive - as the Javadoc comment says correctly. For 
example we can define constants outside and inside classes. Another example is 
the syntax.
   Given we defined a class:
   ```
   class Descriptor():
      def __init__(self, name, description):
           self.name = name
           self.description = description
   ```
   We can instantiate the object in three different ways:
   ```
   Descriptor("name1",  "desc1")
   Descriptor("name2",  description="desc2")
   Descriptor(name="name3",  description="desc3")
   ```
   From AST perspective these are all different cases, which had to be handled 
from code. Usually classes are more complex which will result in a rather 
complex code, or specific to each class.
   * Evaluating functions: assignments can be more complex than assigning a 
constant value or instantiating new object. For example it can be a list 
comprehension just like in ChunkDocument. Processing expression like this is 
possible with AST, although it's rather complex, and hard to make it generic.
   * Definitions spanning across multiple Python modules. An AST tree is built 
for single Python module, and does aware of it's dependencies. Although it 
seems to be possible to process import statement while traversing the AST tree, 
it will again lead to complex and hard to maintain codebase.
   A bigger problem is: at manifest build time not all Python dependencies are 
guaranteed to be there, as Python processors are async loaded, and are 
downloading their dependencies after the processor being started as part of a 
flow. Instead a manifest is created during startup, where a flow is not 
necessarily present.
   
   In summary my intention was to provide a good enough solution to make 
"allowable values" present in the manifest, while considering the current 
limitations we have.
   
   For the dependency download issue 
https://issues.apache.org/jira/browse/NIFI-12959 can be an answer. If the 
dependencies are self contained, it should be possible to also contain a 
extension-manifest.xml in the NAR which could be the just picked up by 
StandardRuntimeManifestService, making the whole parsing logic unnecessary.
   
   Still, for Python processors without self encapsulated NARs would require a 
way to provide information for the manifest. 
   Besides "allowable values", "relationships" and "property dependencies" are 
also missing function
   Given the above reasons, I'm not confident that AST is the best tool to 
achieve this.
   
   What do you think? Is there already a plan or a suggested approach here? I'm 
open to any suggestion / solutions, and happy to participate in implementing 
them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to