[jira] [Commented] (NIFI-11240) Introduce Python API for building Processors

Janis Ax (Jira) Wed, 04 Oct 2023 04:51:05 -0700


    [ 
https://issues.apache.org/jira/browse/NIFI-11240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771832#comment-17771832
 ]


Janis Ax commented on NIFI-11240:
---------------------------------

[~markap14] I suggest adding some examples to the documentation. So new users 
can get some ideas and orientation. You already provided some 
[examples|https://drive.google.com/drive/folders/1VCtNQmThAHL44-t2ORdav9YPIHMvCk_b]
 maybe we can reuse them. 

I think the following topics are interesting:
 * Logging
 * Work with content
 * Work with attributes
 * Work with Record Orientated data
 * Processor as package / modul 
 * Working with properties 
 * Relationships 

 

> Introduce Python API for building Processors
> --------------------------------------------
>
>                 Key: NIFI-11240
>                 URL: https://issues.apache.org/jira/browse/NIFI-11240
>             Project: Apache NiFi
>          Issue Type: Epic
>          Components: Core Framework, Documentation &amp; Website, Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>             Fix For: 2.0.0
>
>
> The scripting processors are very common for data transformation in NiFi. In 
> particular, the Jython based scripts are quite heavily used. However, Jython 
> is run on the JVM and does not support CPython libraries. As a result, it's 
> syntax compatible but doesn't make use of the wealth of Python libraries. And 
> the wealth of Python libraries are what make Python popular to begin with.
> Additionally, use of many script-based processors hurts the UX. They are 
> cumbersome to configure, with script files and/or script bodies. They result 
> in a dataflow that's difficult to understand because instead of nicely named 
> processors like CompressContent the type and default name are 
> "ExecuteScript." They're also difficult to share.
> I have been playing with Py4J for introduce a true Python-based API for 
> developing Processors. This will introduce new APIs, new framework changes, 
> and documentation. And this will likely take a while to stabilize. However, 
> the sooner that we are able to land it into the hands of users, the better. 
> Therefore, I pose that we introduce it in multiple milestones. We can create 
> sub-tickets for different milestones, but in general it should follow:
> Milestone 1: Initial implementation. Provides the capability and an API for 
> building processors. Includes sample code and some documentation. Includes 
> tests to ensure proper operation. Should not be used in production. API will 
> not be stable and may change frequently. Performance may be subpar. Get into 
> the hands of developers to begin exploring and providing feedback / 
> submitting PRs.
> Milestone 2: Bug fixes. API refinement. Improve performance.
> Milestone 3: Additional bug fixes and API refinement. API should become more 
> stable.
> Milestone 4: Additional bug fixes. API becomes stable. Documentation is clear 
> and sufficient. Recommend production use.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NIFI-11240) Introduce Python API for building Processors

Reply via email to