Very cool David! I think a focus on python is very beneficial to the entire 
NiFi ecosystem.

Jeremy Dyer

Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Joe Witt <joe.w...@gmail.com>
Sent: Wednesday, June 26, 2024 5:38:31 PM
To: dev@nifi.apache.org <dev@nifi.apache.org>
Subject: Re: Python extensions in their own repo...

Great work David - thanks!

On Wed, Jun 26, 2024 at 2:36 PM David Handermann <
exceptionfact...@apache.org> wrote:

> Team,
>
> Thanks to collaboration with Joe Witt, the new nifi-python-extensions
> repository [1] is now populated with the initial set of Python
> Processors.
>
> The repository includes a standard GitHub workflow for pull request
> validation that checks license headers and Python code formatting.
>
> The project uses Hatch [2] to run code formatting as well as build
> source and binary distribution packages.
>
> The source distribution and binary wheel packages both contain the
> Python Processors, which can be placed into an Apache NiFi 2
> installation.
>
> The source distribution archive will provide a suitable release
> candidate file when we are ready to release a version of
> nifi-python-extensions.
>
> In Jira, there is a new python-extensions-2.0.0 version to target for
> features and fixes.
>
> There is certainly more room for documentation and improvement, but
> this should provide a reasonable foundation for decoupled Python
> Processor development efforts.
>
> Regards,
> David Handermann
>
> [1] https://github.com/apache/nifi-python-extensions
> [2] https://hatch.pypa.io
>
> On Sat, Jun 22, 2024 at 9:06 AM David Handermann
> <exceptionfact...@apache.org> wrote:
> >
> > Joe,
> >
> > Thanks for raising the discussion, and thanks to everyone for the
> feedback thus far. This tracks our previous discussion on the topic [1].
> >
> > I am also strongly in favor of separating out extensions into their own
> repositories for many of the reasons already mentioned. Starting with a
> single dedicated repository named nifi-python-extensions should be a good
> opportunity to prove out the concept. I agree considering the Java
> extensions is more involved, and I think should consider that separately.
> >
> > I would be glad to handle the initial work of creating the new
> repository and setting up the initial build structure. I am familiar with
> the work necessary to publish to the PyPI repository, and that could
> provide an optional distribution channel. We would still make the source
> distribution available through standard Apache channels, following standard
> project policies. Based on the current structure, it should be
> straightforward to download an archive of the Python extensions and expand
> that for those who want to have them as part of their NiFi installation.
> >
> > Once the initial repository is in place, that last initial step would be
> a pull request to remove the Python extensions from the main repository.
> >
> > I can proceed along these lines, unless any substantive objections come
> up, and the pull request process will also provide opportunity for
> additional consideration and review.
> >
> > Regards,
> > David Handermann
> >
> > [1] https://lists.apache.org/thread/nok561sg1dzw3zrott06gkl34hdjxbb3
> >
> > On Fri, Jun 21, 2024, 9:14 PM Marton Szasz <sza...@apache.org> wrote:
> >>
> >> Hi Joe, Arpad and all,
> >>
> >> I'm strongly in favor of moving all Python components to a separate
> >> repository. It could be called apache/nifi-python-extensions or
> >> -components, and contain all Python components that this community
> >> maintains. I would prefer that over a separate repo for each extension,
> >> because it seems easier to keep track of all components maintained by
> >> the community if they are in the same repo, than if they were separate.
> >>
> >> Since MiNiFi C++ implemented a large subset of the same Python API, I
> >> think it makes our lives easier if we share the code, and keep the
> >> Python components in their own dedicated location. As NiFi and MiNiFi
> >> C++ both approach their next major version, and we commit to a stable
> >> Python API, I expect it to become easier to maintain the Python
> >> components separately, targeting this stable API, and we could align the
> >> release frequency with the maintenance needs of the Python components.
> >>
> >> I'm neutral of whether to package them with convenience binaries or
> >> leave that up to the user. Hopefully we can come up with a user-friendly
> >> way to install them if they're not included. I wouldn't include them in
> >> the source tarballs.
> >>
> >> I would keep the Java components separate from Python components.
> >> Whether that's in the NiFi repo or somewhere else, both are fine with
> me.
> >>
> >> Regarding introducing breaking changes: on the NiFi side, unit tests
> >> should cover the API well enough, and after 2.0 GA, I expect it to
> >> remain backwards-compatible until the next major version. So I think the
> >> API will not be a moving target (things only added, not changed), and it
> >> will be easy to keep things working. But I think we should set up
> >> automated testing that runs tests with the extensions, checking their
> >> functionality with NiFi 2.0, NiFi latest, and at least one MiNiFi C++
> >> version, to catch breakages early if they rear up their head anyway.
> >>
> >> Thanks and have a great weekend,
> >> Marton
> >>
> >>
> >> On 6/21/24 23:18, Joe Witt wrote:
> >> > "I would suggest starting with
> >> > moving the Python ones to a dedicated repo, let's have a workflow
> >> > established and polished there, might follow with some Java ones in
> case it
> >> > works well."
> >> >
> >> > Yeah kinda where my head is too
> >> >
> >> > On Fri, Jun 21, 2024 at 2:07 PM Arpad Boda <ab...@apache.org> wrote:
> >> >
> >> >> Joe,
> >> >>
> >> >> Interesting thoughts, I see a lot of pros and cons. Let me list the
> most
> >> >> important ones of both:
> >> >> +cves in extensions doesn't make nifi "vulnerable" automatically as
> they
> >> >> live in a different repo.
> >> >> +the responsibility of being up-to-date is being moved to the
> maintainers
> >> >> of the given extension, same applies for the stability of the tests
> >> >> covering that extension
> >> >>
> >> >> -easier to introduce breaking changes accidentally: a breaking
> change might
> >> >> go through and get committed. Especially in case of Java extensions,
> they
> >> >> python api is pretty thin (yet!). Only an extension developer will
> find it,
> >> >> most probably not immediately, when things already depend on the
> breaking
> >> >> change and it gets very difficult to make the right call in this case
> >> >> -might lose some extensions as they get even less maintained than
> they are
> >> >> now
> >> >>
> >> >> Overall I have no strong opinion either ways, I would suggest
> starting with
> >> >> moving the Python ones to a dedicated repo, let's have a workflow
> >> >> established and polished there, might follow with some Java ones in
> case it
> >> >> works well.
> >> >>
> >> >> Cheers,
> >> >> Arpad
> >> >>
> >> >> On Friday, June 21, 2024, Joe Witt <joew...@apache.org> wrote:
> >> >>
> >> >>> Team,
> >> >>>
> >> >>> For the longest time we had all these Java based extensions and it
> was
> >> >>> often inconvenient for them to live within the codebase.  Indeed it
> makes
> >> >>> the builds crazy long and it delays getting new components out.  We
> had a
> >> >>> lot of work to do for this to be convenient and perhaps we still
> have
> >> >> gaps
> >> >>> remaining.
> >> >>>
> >> >>> Now we have these Python components.  I am not confident we really
> want
> >> >>> these in the codebase for similar but even more important reasons.
> The
> >> >>> python components have similar issues when it comes to Licensing and
> >> >> Notice
> >> >>> recognition.  They have their own rapid vulnerability tracking.  Our
> >> >>> current tooling doesn't make tracking that very easy.
> >> >>>
> >> >>> I'm concerned about where the Python ones are heading in terms of
> >> >>> maintainability but also generally for the builds as well with the
> Java
> >> >>> ones.  Is it time to move to a repo for the Java extensions and its
> own
> >> >>> project/group name and versioning?  Same for Python extensions?
> >> >>>
> >> >>> This lets them evolve on their own schedule.  It does bring up an
> >> >>> interesting challenge as it relates to a convenience binary.  The
> ideal
> >> >>> state is extensions are released and shipped independent of the nifi
> >> >>> application.  But we'd need to make that really nice/easy for the
> users.
> >> >>>
> >> >>> We have a lot going on so maybe still not time to tackle this.
> Curious
> >> >> to
> >> >>> hear thoughts
> >> >>>
> >> >>> Thanks
> >> >>>
>

Reply via email to