Team,

Thanks to collaboration with Joe Witt, the new nifi-python-extensions
repository [1] is now populated with the initial set of Python
Processors.

The repository includes a standard GitHub workflow for pull request
validation that checks license headers and Python code formatting.

The project uses Hatch [2] to run code formatting as well as build
source and binary distribution packages.

The source distribution and binary wheel packages both contain the
Python Processors, which can be placed into an Apache NiFi 2
installation.

The source distribution archive will provide a suitable release
candidate file when we are ready to release a version of
nifi-python-extensions.

In Jira, there is a new python-extensions-2.0.0 version to target for
features and fixes.

There is certainly more room for documentation and improvement, but
this should provide a reasonable foundation for decoupled Python
Processor development efforts.

Regards,
David Handermann

[1] https://github.com/apache/nifi-python-extensions
[2] https://hatch.pypa.io

On Sat, Jun 22, 2024 at 9:06 AM David Handermann
<exceptionfact...@apache.org> wrote:
>
> Joe,
>
> Thanks for raising the discussion, and thanks to everyone for the feedback 
> thus far. This tracks our previous discussion on the topic [1].
>
> I am also strongly in favor of separating out extensions into their own 
> repositories for many of the reasons already mentioned. Starting with a 
> single dedicated repository named nifi-python-extensions should be a good 
> opportunity to prove out the concept. I agree considering the Java extensions 
> is more involved, and I think should consider that separately.
>
> I would be glad to handle the initial work of creating the new repository and 
> setting up the initial build structure. I am familiar with the work necessary 
> to publish to the PyPI repository, and that could provide an optional 
> distribution channel. We would still make the source distribution available 
> through standard Apache channels, following standard project policies. Based 
> on the current structure, it should be straightforward to download an archive 
> of the Python extensions and expand that for those who want to have them as 
> part of their NiFi installation.
>
> Once the initial repository is in place, that last initial step would be a 
> pull request to remove the Python extensions from the main repository.
>
> I can proceed along these lines, unless any substantive objections come up, 
> and the pull request process will also provide opportunity for additional 
> consideration and review.
>
> Regards,
> David Handermann
>
> [1] https://lists.apache.org/thread/nok561sg1dzw3zrott06gkl34hdjxbb3
>
> On Fri, Jun 21, 2024, 9:14 PM Marton Szasz <sza...@apache.org> wrote:
>>
>> Hi Joe, Arpad and all,
>>
>> I'm strongly in favor of moving all Python components to a separate
>> repository. It could be called apache/nifi-python-extensions or
>> -components, and contain all Python components that this community
>> maintains. I would prefer that over a separate repo for each extension,
>> because it seems easier to keep track of all components maintained by
>> the community if they are in the same repo, than if they were separate.
>>
>> Since MiNiFi C++ implemented a large subset of the same Python API, I
>> think it makes our lives easier if we share the code, and keep the
>> Python components in their own dedicated location. As NiFi and MiNiFi
>> C++ both approach their next major version, and we commit to a stable
>> Python API, I expect it to become easier to maintain the Python
>> components separately, targeting this stable API, and we could align the
>> release frequency with the maintenance needs of the Python components.
>>
>> I'm neutral of whether to package them with convenience binaries or
>> leave that up to the user. Hopefully we can come up with a user-friendly
>> way to install them if they're not included. I wouldn't include them in
>> the source tarballs.
>>
>> I would keep the Java components separate from Python components.
>> Whether that's in the NiFi repo or somewhere else, both are fine with me.
>>
>> Regarding introducing breaking changes: on the NiFi side, unit tests
>> should cover the API well enough, and after 2.0 GA, I expect it to
>> remain backwards-compatible until the next major version. So I think the
>> API will not be a moving target (things only added, not changed), and it
>> will be easy to keep things working. But I think we should set up
>> automated testing that runs tests with the extensions, checking their
>> functionality with NiFi 2.0, NiFi latest, and at least one MiNiFi C++
>> version, to catch breakages early if they rear up their head anyway.
>>
>> Thanks and have a great weekend,
>> Marton
>>
>>
>> On 6/21/24 23:18, Joe Witt wrote:
>> > "I would suggest starting with
>> > moving the Python ones to a dedicated repo, let's have a workflow
>> > established and polished there, might follow with some Java ones in case it
>> > works well."
>> >
>> > Yeah kinda where my head is too
>> >
>> > On Fri, Jun 21, 2024 at 2:07 PM Arpad Boda <ab...@apache.org> wrote:
>> >
>> >> Joe,
>> >>
>> >> Interesting thoughts, I see a lot of pros and cons. Let me list the most
>> >> important ones of both:
>> >> +cves in extensions doesn't make nifi "vulnerable" automatically as they
>> >> live in a different repo.
>> >> +the responsibility of being up-to-date is being moved to the maintainers
>> >> of the given extension, same applies for the stability of the tests
>> >> covering that extension
>> >>
>> >> -easier to introduce breaking changes accidentally: a breaking change 
>> >> might
>> >> go through and get committed. Especially in case of Java extensions, they
>> >> python api is pretty thin (yet!). Only an extension developer will find 
>> >> it,
>> >> most probably not immediately, when things already depend on the breaking
>> >> change and it gets very difficult to make the right call in this case
>> >> -might lose some extensions as they get even less maintained than they are
>> >> now
>> >>
>> >> Overall I have no strong opinion either ways, I would suggest starting 
>> >> with
>> >> moving the Python ones to a dedicated repo, let's have a workflow
>> >> established and polished there, might follow with some Java ones in case 
>> >> it
>> >> works well.
>> >>
>> >> Cheers,
>> >> Arpad
>> >>
>> >> On Friday, June 21, 2024, Joe Witt <joew...@apache.org> wrote:
>> >>
>> >>> Team,
>> >>>
>> >>> For the longest time we had all these Java based extensions and it was
>> >>> often inconvenient for them to live within the codebase.  Indeed it makes
>> >>> the builds crazy long and it delays getting new components out.  We had a
>> >>> lot of work to do for this to be convenient and perhaps we still have
>> >> gaps
>> >>> remaining.
>> >>>
>> >>> Now we have these Python components.  I am not confident we really want
>> >>> these in the codebase for similar but even more important reasons.  The
>> >>> python components have similar issues when it comes to Licensing and
>> >> Notice
>> >>> recognition.  They have their own rapid vulnerability tracking.  Our
>> >>> current tooling doesn't make tracking that very easy.
>> >>>
>> >>> I'm concerned about where the Python ones are heading in terms of
>> >>> maintainability but also generally for the builds as well with the Java
>> >>> ones.  Is it time to move to a repo for the Java extensions and its own
>> >>> project/group name and versioning?  Same for Python extensions?
>> >>>
>> >>> This lets them evolve on their own schedule.  It does bring up an
>> >>> interesting challenge as it relates to a convenience binary.  The ideal
>> >>> state is extensions are released and shipped independent of the nifi
>> >>> application.  But we'd need to make that really nice/easy for the users.
>> >>>
>> >>> We have a lot going on so maybe still not time to tackle this.  Curious
>> >> to
>> >>> hear thoughts
>> >>>
>> >>> Thanks
>> >>>

Reply via email to