Re: [DISCUSS] Creating an external connector repository

Arvid Heise Thu, 09 Dec 2021 06:38:05 -0800

Hi all,

We tried out Chesnay's proposal and went with Option 2. Unfortunately, we
experienced tough nuts to crack and feel like we hit a dead end:
- The main pain point with the outlined Frankensteinian connector repo is
how to handle shared code / infra code. If we have it in some <common>
branch, then we need to merge the common branch in the connector branch on
update. However, it's unclear to me how improvements in the common branch
that naturally appear while working on a specific connector go back into
the common branch. You can't use a pull request from your branch or else
your connector code would poison the connector-less common branch. So you
would probably manually copy the files over to a common branch and create a
PR branch for that.
- A weird solution could be to have the common branch as a submodule in the
repo itself (if that's even possible). I'm sure that this setup would blow
up the minds of all newcomers.
- Similarly, it's mandatory to have safeguards against code from connector
A poisoning connector B, common, or main. I had some similar setup in the
past and code from two "distinct" branch types constantly swept over.
- We could also say that we simply release <common> independently and just
have a maven (SNAPSHOT) dependency on it. But that would create a weird
flow if you need to change in common where you need to constantly switch
branches back and forth.
- In general, Frankensteinian's approach is very switch intensive. If you
maintain 3 connectors and need to fix 1 build stability each at the same
time (quite common nowadays for some reason) and you have 2 review rounds,
you need to switch branches 9 times ignoring changes to common.


Additionally, we still have the rather user/dev unfriendly main that is
mostly empty. I'm also not sure we can generate an overview README.md to
make it more friendly here because in theory every connector branch should
be based on main and we would get merge conflicts.

I'd like to propose once again to go with individual repositories.
- The only downside that we discussed so far is that we have more initial
setup to do. Since we organically grow the number of connector/repositories
that load is quite distributed. We can offer templates after finding a good
approach that can even be used by outside organizations.
- Regarding secrets, I think it's actually an advantage that the Kafka
connector has no access to the AWS secrets. If there are secrets to be
shared across connectors, we can and should use Azure's Variable Groups (I
have used it in the past to share Nexus creds across repos). That would
also make rotation easy.
- Working on different connectors would be rather easy as all modern IDE
support multiple repo setups in the same project. You still need to do
multiple releases in case you update common code (either accessed through
Nexus or git submodule) and you want to release your connector.
- There is no difference in respect to how many CI runs there in both
approaches.
- Individual repositories also have the advantage of allowing external
incubation. Let's assume someone builds connector A and hosts it in their
organization (very common setup). If they want to contribute the code to
Flink, we could simply transfer the repository into ASF after ensuring
Flink coding standards. Then we retain git history and Github issues.

Is there any point that I'm missing?

On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler <ches...@apache.org> wrote:

> For sharing workflows we should be able to use composite actions. We'd
> have the main definition files in the flink-connectors repo, that we
> also need to tag/release, which other branches/repos can then import.
> These are also versioned, so we don't have to worry about accidentally
> breaking stuff.
> These could also be used to enforce certain standards / interfaces such
> that we can automate more things (e.g., integration into the Flink
> documentation).
>
> It is true that Option 2) and dedicated repositories share a lot of
> properties. While I did say in an offline conversation that we in that
> case might just as well use separate repositories, I'm not so sure
> anymore. One repo would make administration a bit easier, for example
> secrets wouldn't have to be applied to each repo (we wouldn't want
> certain secrets to be set up organization-wide).
> I overall also like that one repo would present a single access point;
> you can't "miss" a connector repo, and I would hope that having it as
> one repo would nurture more collaboration between the connectors, which
> after all need to solve similar problems.
>
> It is a fair point that the branching model would be quite weird, but I
> think that would subside pretty quickly.
>
> Personally I'd go with Option 2, and if that doesn't work out we can
> still split the repo later on. (Which should then be a trivial matter of
> copying all <connector>/* branches and renaming them).
>
> On 26/11/2021 12:47, Till Rohrmann wrote:
> > Hi Arvid,
> >
> > Thanks for updating this thread with the latest findings. The described
> > limitations for a single connector repo sound suboptimal to me.
> >
> > * Option 2. sounds as if we try to simulate multi connector repos inside
> of
> > a single repo. I also don't know how we would share code between the
> > different branches (sharing infrastructure would probably be easier
> > though). This seems to have the same limitations as dedicated repos with
> > the downside of having a not very intuitive branching model.
> > * Isn't option 1. kind of a degenerated version of option 2. where we
> have
> > some unrelated code from other connectors in the individual connector
> > branches?
> > * Option 3. has the downside that someone creating a release has to
> release
> > all connectors. This means that she either has to sync with the different
> > connector maintainers or has to be able to release all connectors on her
> > own. We are already seeing in the Flink community that releases require
> > quite good communication/coordination between the different people
> working
> > on different Flink components. Given our goals to make connector releases
> > easier and more frequent, I think that coupling different connector
> > releases might be counter-productive.
> >
> > To me it sounds not very practical to mainly use a mono repository w/o
> > having some more advanced build infrastructure that, for example, allows
> to
> > have different git roots in different connector directories. Maybe the
> mono
> > repo can be a catch all repository for connectors that want to be
> released
> > in lock-step (Option 3.) with all other connectors the repo contains. But
> > for connectors that get changed frequently, having a dedicated repository
> > that allows independent releases sounds preferable to me.
> >
> > What utilities and infrastructure code do you intend to share? Using git
> > submodules can definitely be one option to share code. However, it might
> > also be ok to depend on flink-connector-common artifacts which could make
> > things easier. Where I am unsure is whether git submodules can be used to
> > share infrastructure code (e.g. the .github/workflows) because you need
> > these files in the repo to trigger the CI infrastructure.
> >
> > Cheers,
> > Till
> >
> > On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise <ar...@apache.org> wrote:
> >
> >> Hi Brian,
> >>
> >> Thank you for sharing. I think your approach is very valid and is in
> line
> >> with what I had in mind.
> >>
> >> Basically Pravega community aligns the connector releases with the
> Pravega
> >>> mainline release
> >>>
> >> This certainly would mean that there is little value in coupling
> connector
> >> versions. So it's making a good case for having separate connector
> repos.
> >>
> >>
> >>> and maintains the connector with the latest 3 Flink versions(CI will
> >>> publish snapshots for all these 3 branches)
> >>>
> >> I'd like to give connector devs a simple way to express to which Flink
> >> versions the current branch is compatible. From there we can generate
> the
> >> compatibility matrix automatically and optionally also create different
> >> releases per supported Flink version. Not sure if the latter is indeed
> >> better than having just one artifact that happens to run with multiple
> >> Flink versions. I guess it depends on what dependencies we are
> exposing. If
> >> the connector uses flink-connector-base, then we probably need separate
> >> artifacts with poms anyways.
> >>
> >> Best,
> >>
> >> Arvid
> >>
> >> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian <b.z...@dell.com> wrote:
> >>
> >>> Hi Arvid,
> >>>
> >>> For branching model, the Pravega Flink connector has some experience
> what
> >>> I would like to share. Here[1][2] is the compatibility matrix and wiki
> >>> explaining the branching model and releases. Basically Pravega
> community
> >>> aligns the connector releases with the Pravega mainline release, and
> >>> maintains the connector with the latest 3 Flink versions(CI will
> publish
> >>> snapshots for all these 3 branches).
> >>> For example, recently we have 0.10.1 release[3], and in maven central
> we
> >>> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for 0.10.1
> >>> version[4].
> >>>
> >>> There are some alternatives. Another solution that we once discussed
> but
> >>> finally got abandoned is to have a independent version just like the
> >>> current CDC connector, and then give a big compatibility matrix to
> users.
> >>> We think it would be too confusing when the connector develops. On the
> >>> contrary, we can also do the opposite way to align with Flink version
> and
> >>> maintain several branches for different system version.
> >>>
> >>> I would say this is only a fairly-OK solution because it is a bit
> painful
> >>> for maintainers as cherry-picks are very common and releases would
> >> require
> >>> much work. However, if neither systems do not have a nice backward
> >>> compatibility, there seems to be no comfortable solution to the their
> >>> connector.
> >>>
> >>> [1] https://github.com/pravega/flink-connectors#compatibility-matrix
> >>> [2]
> >>>
> >>
> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
> >>> [3] https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
> >>> [4] https://search.maven.org/search?q=pravega-connectors-flink
> >>>
> >>> Best Regards,
> >>> Brian
> >>>
> >>>
> >>> Internal Use - Confidential
> >>>
> >>> -----Original Message-----
> >>> From: Arvid Heise <ar...@apache.org>
> >>> Sent: Friday, November 19, 2021 4:12 PM
> >>> To: dev
> >>> Subject: Re: [DISCUSS] Creating an external connector repository
> >>>
> >>>
> >>> [EXTERNAL EMAIL]
> >>>
> >>> Hi everyone,
> >>>
> >>> we are currently in the process of setting up the flink-connectors repo
> >>> [1] for new connectors but we hit a wall that we currently cannot take:
> >>> branching model.
> >>> To reiterate the original motivation of the external connector repo: We
> >>> want to decouple the release cycle of a connector with Flink. However,
> if
> >>> we want to support semantic versioning in the connectors with the
> ability
> >>> to introduce breaking changes through major version bumps and support
> >>> bugfixes on old versions, then we need release branches similar to how
> >>> Flink core operates.
> >>> Consider two connectors, let's call them kafka and hbase. We have kafka
> >> in
> >>> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config option) change
> >> and
> >>> hbase only on 1.0.A.
> >>>
> >>> Now our current assumption was that we can work with a mono-repo under
> >> ASF
> >>> (flink-connectors). Then, for release-branches, we found 3 options:
> >>> 1. We would need to create some ugly mess with the cross product of
> >>> connector and version: so you have kafka-release-1.0,
> kafka-release-1.1,
> >>> kafka-release-2.0, hbase-release-1.0. The main issue is not the amount
> of
> >>> branches (that's something that git can handle) but there the state of
> >>> kafka is undefined in hbase-release-1.0. That's a call for desaster and
> >>> makes releasing connectors very cumbersome (CI would only execute and
> >>> publish hbase SNAPSHOTS on hbase-release-1.0).
> >>> 2. We could avoid the undefined state by having an empty master and
> each
> >>> release branch really only holds the code of the connector. But that's
> >> also
> >>> not great: any user that looks at the repo and sees no connector would
> >>> assume that it's dead.
> >>> 3. We could have synced releases similar to the CDC connectors [2].
> That
> >>> means that if any connector introduces a breaking change, all
> connectors
> >>> get a new major. I find that quite confusing to a user if hbase gets a
> >> new
> >>> release without any change because kafka introduced a breaking change.
> >>>
> >>> To fully decouple release cycles and CI of connectors, we could add
> >>> individual repositories under ASF (flink-connector-kafka,
> >>> flink-connector-hbase). Then we can apply the same branching model as
> >>> before. I quickly checked if there are precedences in the apache
> >> community
> >>> for that approach and just by scanning alphabetically I found cordova
> >> with
> >>> 70 and couchdb with 77 apache repos respectively. So it certainly seems
> >>> like other projects approached our problem in that way and the apache
> >>> organization is okay with that. I currently expect max 20 additional
> >> repos
> >>> for connectors and in the future 10 max each for formats and
> filesystems
> >> if
> >>> we would also move them out at some point in time. So we would be at a
> >>> total of 50 repos.
> >>>
> >>> Note for all options, we need to provide a compability matrix that we
> aim
> >>> to autogenerate.
> >>>
> >>> Now for the potential downsides that we internally discussed:
> >>> - How can we ensure common infra structure code, utilties, and quality?
> >>> I propose to add a flink-connector-common that contains all these
> things
> >>> and is added as a git submodule/subtree to the repos.
> >>> - Do we implicitly discourage connector developers to maintain more
> than
> >>> one connector with a fragmented code base?
> >>> That is certainly a risk. However, I currently also see few devs
> working
> >>> on more than one connector. However, it may actually help keeping the
> >> devs
> >>> that maintain a specific connector on the hook. We could use github
> >> issues
> >>> to track bugs and feature requests and a dev can focus his limited time
> >> on
> >>> getting that one connector right.
> >>>
> >>> So WDYT? Compared to some intermediate suggestions with split repos,
> the
> >>> big difference is that everything remains under Apache umbrella and the
> >>> Flink community.
> >>>
> >>> [1]
> >>>
> >>
> https://urldefense.com/v3/__https://github.com/apache/flink-connectors__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4$
> >>> [github[.]com] [2]
> >>>
> >>
> https://urldefense.com/v3/__https://github.com/ververica/flink-cdc-connectors/__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzgoPGA8$
> >>> [github[.]com]
> >>>
> >>> On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <ar...@apache.org> wrote:
> >>>
> >>>> Hi everyone,
> >>>>
> >>>> I created the flink-connectors repo [1] to advance the topic. We would
> >>>> create a proof-of-concept in the next few weeks as a special branch
> >>>> that I'd then use for discussions. If the community agrees with the
> >>>> approach, that special branch will become the master. If not, we can
> >>>> reiterate over it or create competing POCs.
> >>>>
> >>>> If someone wants to try things out in parallel, just make sure that
> >>>> you are not accidentally pushing POCs to the master.
> >>>>
> >>>> As a reminder: We will not move out any current connector from Flink
> >>>> at this point in time, so everything in Flink will remain as is and be
> >>>> maintained there.
> >>>>
> >>>> Best,
> >>>>
> >>>> Arvid
> >>>>
> >>>> [1]
> >>>>
> https://urldefense.com/v3/__https://github.com/apache/flink-connectors
> >>>> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4
> >>>> $ [github[.]com]
> >>>>
> >>>> On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann <trohrm...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Hi everyone,
> >>>>>
> >>>>>  From the discussion, it seems to me that we have different opinions
> >>>>> whether to have an ASF umbrella repository or to host them outside of
> >>>>> the ASF. It also seems that this is not really the problem to solve.
> >>>>> Since there are many good arguments for either approach, we could
> >>>>> simply start with an ASF umbrella repository and see how people adopt
> >>>>> it. If the individual connectors cannot move fast enough or if people
> >>>>> prefer to not buy into the more heavy-weight ASF processes, then they
> >>>>> can host the code also somewhere else. We simply need to make sure
> >>>>> that these connectors are discoverable (e.g. via flink-packages).
> >>>>>
> >>>>> The more important problem seems to be to provide common tooling
> >>>>> (testing, infrastructure, documentation) that can easily be reused.
> >>>>> Similarly, it has become clear that the Flink community needs to
> >>>>> improve on providing stable APIs. I think it is not realistic to
> >>>>> first complete these tasks before starting to move connectors to
> >>>>> dedicated repositories. As Stephan said, creating a connector
> >>>>> repository will force us to pay more attention to API stability and
> >>>>> also to think about which testing tools are required. Hence, I
> >>>>> believe that starting to add connectors to a different repository
> >>>>> than apache/flink will help improve our connector tooling (declaring
> >>>>> testing classes as public, creating a common test utility repo,
> >>>>> creating a repo
> >>>>> template) and vice versa. Hence, I like Arvid's proposed process as
> >>>>> it will start kicking things off w/o letting this effort fizzle out.
> >>>>>
> >>>>> Cheers,
> >>>>> Till
> >>>>>
> >>>>> On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen <se...@apache.org>
> >> wrote:
> >>>>>> Thank you all, for the nice discussion!
> >>>>>>
> >>>>>>  From my point of view, I very much like the idea of putting
> >>>>>> connectors
> >>>>> in a
> >>>>>> separate repository. But I would argue it should be part of Apache
> >>>>> Flink,
> >>>>>> similar to flink-statefun, flink-ml, etc.
> >>>>>>
> >>>>>> I share many of the reasons for that:
> >>>>>>    - As argued many times, reduces complexity of the Flink repo,
> >>>>> increases
> >>>>>> response times of CI, etc.
> >>>>>>    - Much lower barrier of contribution, because an unstable
> >>>>>> connector
> >>>>> would
> >>>>>> not de-stabilize the whole build. Of course, we would need to make
> >>>>>> sure
> >>>>> we
> >>>>>> set this up the right way, with connectors having individual CI
> >>>>>> runs,
> >>>>> build
> >>>>>> status, etc. But it certainly seems possible.
> >>>>>>
> >>>>>>
> >>>>>> I would argue some points a bit different than some cases made
> >> before:
> >>>>>> (a) I believe the separation would increase connector stability.
> >>>>> Because it
> >>>>>> really forces us to work with the connectors against the APIs like
> >>>>>> any external developer. A mono repo is somehow the wrong thing if
> >>>>>> you in practice want to actually guarantee stable internal APIs at
> >>> some layer.
> >>>>>> Because the mono repo makes it easy to just change something on
> >>>>>> both
> >>>>> sides
> >>>>>> of the API (provider and consumer) seamlessly.
> >>>>>>
> >>>>>> Major refactorings in Flink need to keep all connector API
> >>>>>> contracts intact, or we need to have a new version of the connector
> >>> API.
> >>>>>> (b) We may even be able to go towards more lightweight and
> >>>>>> automated releases over time, even if we stay in Apache Flink with
> >>> that repo.
> >>>>>> This isn't yet fully aligned with the Apache release policies, yet,
> >>>>>> but there are board discussions about whether there can be
> >>>>>> bot-triggered releases (by dependabot) and how that could fit into
> >>> the Apache process.
> >>>>>> This doesn't seem to be quite there just yet, but seeing that those
> >>>>> start
> >>>>>> is a good sign, and there is a good chance we can do some things
> >>> there.
> >>>>>> I am not sure whether we should let bots trigger releases, because
> >>>>>> a
> >>>>> final
> >>>>>> human look at things isn't a bad thing, especially given the
> >>>>>> popularity
> >>>>> of
> >>>>>> software supply chain attacks recently.
> >>>>>>
> >>>>>>
> >>>>>> I do share Chesnay's concerns about complexity in tooling, though.
> >>>>>> Both release tooling and test tooling. They are not incompatible
> >>>>>> with that approach, but they are a task we need to tackle during
> >>>>>> this change which will add additional work.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise <ar...@apache.org>
> >>> wrote:
> >>>>>>> Hi folks,
> >>>>>>>
> >>>>>>> I think some questions came up and I'd like to address the
> >>>>>>> question of
> >>>>>> the
> >>>>>>> timing.
> >>>>>>>
> >>>>>>> Could you clarify what release cadence you're thinking of?
> >>>>>>> There's
> >>>>> quite
> >>>>>>>> a big range that fits "more frequent than Flink" (per-commit,
> >>>>>>>> daily, weekly, bi-weekly, monthly, even bi-monthly).
> >>>>>>> The short answer is: as often as needed:
> >>>>>>> - If there is a CVE in a dependency and we need to bump it -
> >>>>>>> release immediately.
> >>>>>>> - If there is a new feature merged, release soonish. We may
> >>>>>>> collect a
> >>>>> few
> >>>>>>> successive features before a release.
> >>>>>>> - If there is a bugfix, release immediately or soonish depending
> >>>>>>> on
> >>>>> the
> >>>>>>> severity and if there are workarounds available.
> >>>>>>>
> >>>>>>> We should not limit ourselves; the whole idea of independent
> >>>>>>> releases
> >>>>> is
> >>>>>>> exactly that you release as needed. There is no release planning
> >>>>>>> or anything needed, you just go with a release as if it was an
> >>>>>>> external artifact.
> >>>>>>>
> >>>>>>> (1) is the connector API already stable?
> >>>>>>>>  From another discussion thread [1], connector API is far from
> >>>>> stable.
> >>>>>>>> Currently, it's hard to build connectors against multiple Flink
> >>>>>> versions.
> >>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and 1.13 ->
> >>>>>>>> 1.14
> >>>>>> and
> >>>>>>>>   maybe also in the future versions,  because Table related APIs
> >>>>>>>> are
> >>>>>> still
> >>>>>>>> @PublicEvolving and new Sink API is still @Experimental.
> >>>>>>>>
> >>>>>>> The question is: what is stable in an evolving system? We
> >>>>>>> recently discovered that the old SourceFunction needed to be
> >>>>>>> refined such that cancellation works correctly [1]. So that
> >>>>>>> interface is in Flink since
> >>>>> 7
> >>>>>>> years, heavily used also outside, and we still had to change the
> >>>>> contract
> >>>>>>> in a way that I'd expect any implementer to recheck their
> >>>>> implementation.
> >>>>>>> It might not be necessary to change anything and you can probably
> >>>>> change
> >>>>>>> the the code for all Flink versions but still, the interface was
> >>>>>>> not
> >>>>>> stable
> >>>>>>> in the closest sense.
> >>>>>>>
> >>>>>>> If we focus just on API changes on the unified interfaces, then
> >>>>>>> we
> >>>>> expect
> >>>>>>> one more change to Sink API to support compaction. For Table API,
> >>>>> there
> >>>>>>> will most likely also be some changes in 1.15. So we could wait
> >>>>>>> for
> >>>>> 1.15.
> >>>>>>> But I'm questioning if that's really necessary because we will
> >>>>>>> add
> >>>>> more
> >>>>>>> functionality beyond 1.15 without breaking API. For example, we
> >>>>>>> may
> >>>>> add
> >>>>>>> more unified connector metrics. If you want to use it in your
> >>>>> connector,
> >>>>>>> you have to support multiple Flink versions anyhow. So rather
> >>>>>>> then
> >>>>>> focusing
> >>>>>>> the discussion on "when is stuff stable", I'd rather focus on
> >>>>>>> "how
> >>>>> can we
> >>>>>>> support building connectors against multiple Flink versions" and
> >>>>>>> make
> >>>>> it
> >>>>>> as
> >>>>>>> painless as possible.
> >>>>>>>
> >>>>>>> Chesnay pointed out to use different branches for different Flink
> >>>>>> versions
> >>>>>>> which sounds like a good suggestion. With a mono-repo, we can't
> >>>>>>> use branches differently anyways (there is no way to have release
> >>>>>>> branches
> >>>>>> per
> >>>>>>> connector without chaos). In these branches, we could provide
> >>>>>>> shims to simulate future features in older Flink versions such
> >>>>>>> that code-wise,
> >>>>> the
> >>>>>>> source code of a specific connector may not diverge (much). For
> >>>>> example,
> >>>>>> to
> >>>>>>> register unified connector metrics, we could simulate the current
> >>>>>> approach
> >>>>>>> also in some utility package of the mono-repo.
> >>>>>>>
> >>>>>>> I see the stable core Flink API as a prerequisite for modularity.
> >>>>>>> And
> >>>>>>>> for connectors it is not just the source and sink API (source
> >>>>>>>> being stable as of 1.14), but everything that is required to
> >>>>>>>> build and maintain a connector downstream, such as the test
> >>>>>>>> utilities and infrastructure.
> >>>>>>>>
> >>>>>>> That is a very fair point. I'm actually surprised to see that
> >>>>>>> MiniClusterWithClientResource is not public. I see it being used
> >>>>>>> in
> >>>>> all
> >>>>>>> connectors, especially outside of Flink. I fear that as long as
> >>>>>>> we do
> >>>>> not
> >>>>>>> have connectors outside, we will not properly annotate and
> >>>>>>> maintain
> >>>>> these
> >>>>>>> utilties in a classic hen-and-egg-problem. I will outline an idea
> >>>>>>> at
> >>>>> the
> >>>>>>> end.
> >>>>>>>
> >>>>>>>> the connectors need to be adopted and require at least one
> >>>>>>>> release
> >>>>> per
> >>>>>>>> Flink minor release.
> >>>>>>>> However, this will make the releases of connectors slower, e.g.
> >>>>>> maintain
> >>>>>>>> features for multiple branches and release multiple branches.
> >>>>>>>> I think the main purpose of having an external connector
> >>>>>>>> repository
> >>>>> is
> >>>>>> in
> >>>>>>>> order to have "faster releases of connectors"?
> >>>>>>>>
> >>>>>>>> Imagine a project with a complex set of dependencies. Let's say
> >>>>> Flink
> >>>>>>>> version A plus Flink reliant dependencies released by other
> >>>>>>>> projects (Flink-external connectors, Beam, Iceberg, Hudi, ..).
> >>>>>>>> We don't want
> >>>>> a
> >>>>>>>> situation where we bump the core Flink version to B and things
> >>>>>>>> fall apart (interface changes, utilities that were useful but
> >>>>>>>> not public, transitive dependencies etc.).
> >>>>>>>>
> >>>>>>> Yes, that's why I wanted to automate the processes more which is
> >>>>>>> not
> >>>>> that
> >>>>>>> easy under ASF. Maybe we automate the source provision across
> >>>>> supported
> >>>>>>> versions and have 1 vote thread for all versions of a connector?
> >>>>>>>
> >>>>>>>  From the perspective of CDC connector maintainers, the biggest
> >>>>> advantage
> >>>>>> of
> >>>>>>>> maintaining it outside of the Flink project is that:
> >>>>>>>> 1) we can have a more flexible and faster release cycle
> >>>>>>>> 2) we can be more liberal with committership for connector
> >>>>> maintainers
> >>>>>>>> which can also attract more committers to help the release.
> >>>>>>>>
> >>>>>>>> Personally, I think maintaining one connector repository under
> >>>>>>>> the
> >>>>> ASF
> >>>>>>> may
> >>>>>>>> not have the above benefits.
> >>>>>>>>
> >>>>>>> Yes, I also feel that ASF is too restrictive for our needs. But
> >>>>>>> it
> >>>>> feels
> >>>>>>> like there are too many that see it differently and I think we
> >>>>>>> need
> >>>>>>>
> >>>>>>> (2) Flink testability without connectors.
> >>>>>>>> This is a very good question. How can we guarantee the new
> >>>>>>>> Source
> >>>>> and
> >>>>>>> Sink
> >>>>>>>> API are stable with only test implementation?
> >>>>>>>>
> >>>>>>> We can't and shouldn't. Since the connector repo is managed by
> >>>>>>> Flink,
> >>>>> a
> >>>>>>> Flink release manager needs to check if the Flink connectors are
> >>>>> actually
> >>>>>>> working prior to creating an RC. That's similar to how
> >>>>>>> flink-shaded
> >>>>> and
> >>>>>>> flink core are related.
> >>>>>>>
> >>>>>>>
> >>>>>>> So here is one idea that I had to get things rolling. We are
> >>>>>>> going to address the external repo iteratively without
> >>>>>>> compromising what we
> >>>>>> already
> >>>>>>> have:
> >>>>>>> 1.Phase, add new contributions to external repo. We use that time
> >>>>>>> to
> >>>>>> setup
> >>>>>>> infra accordingly and optimize release processes. We will
> >>>>>>> identify
> >>>>> test
> >>>>>>> utilities that are not yet public/stable and fix that.
> >>>>>>> 2.Phase, add ports to the new unified interfaces of existing
> >>>>> connectors.
> >>>>>>> That requires a previous Flink release to make utilities stable.
> >>>>>>> Keep
> >>>>> old
> >>>>>>> interfaces in flink-core.
> >>>>>>> 3.Phase, remove old interfaces in flink-core of some connectors
> >>>>>>> (tbd
> >>>>> at a
> >>>>>>> later point).
> >>>>>>> 4.Phase, optionally move all remaining connectors (tbd at a later
> >>>>> point).
> >>>>>>> I'd envision having ~3 months between the starting the different
> >>>>> phases.
> >>>>>>> WDYT?
> >>>>>>>
> >>>>>>>
> >>>>>>> [1]
> >>>>>>> https://urldefense.com/v3/__https://issues.apache.org/jira/browse
> >>>>>>> /FLINK-23527__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgd
> >>>>>>> ke_-XjpYgX2sIvAP4$ [issues[.]apache[.]org]
> >>>>>>>
> >>>>>>> On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson <k...@tabular.io>
> >>>>> wrote:
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> My name is Kyle and I’m an open source developer primarily
> >>>>>>>> focused
> >>>>> on
> >>>>>>>> Apache Iceberg.
> >>>>>>>>
> >>>>>>>> I’m happy to help clarify or elaborate on any aspect of our
> >>>>> experience
> >>>>>>>> working on a relatively decoupled connector that is downstream
> >>>>>>>> and
> >>>>>> pretty
> >>>>>>>> popular.
> >>>>>>>>
> >>>>>>>> I’d also love to be able to contribute or assist in any way I
> >> can.
> >>>>>>>> I don’t mean to thread jack, but are there any meetings or
> >>>>>>>> community
> >>>>>> sync
> >>>>>>>> ups, specifically around the connector APIs, that I might join
> >>>>>>>> / be
> >>>>>>> invited
> >>>>>>>> to?
> >>>>>>>>
> >>>>>>>> I did want to add that even though I’ve experienced some of the
> >>>>>>>> pain
> >>>>>>> points
> >>>>>>>> of integrating with an evolving system / API (catalog support
> >>>>>>>> is
> >>>>>>> generally
> >>>>>>>> speaking pretty new everywhere really in this space), I also
> >>>>>>>> agree personally that you shouldn’t slow down development
> >>>>>>>> velocity too
> >>>>> much
> >>>>>> for
> >>>>>>>> the sake of external connector. Getting to a performant and
> >>>>>>>> stable
> >>>>>> place
> >>>>>>>> should be the primary goal, and slowing that down to support
> >>>>> stragglers
> >>>>>>>> will (in my personal opinion) always be a losing game. Some
> >>>>>>>> folks
> >>>>> will
> >>>>>>>> simply stay behind on versions regardless until they have to
> >>>>> upgrade.
> >>>>>>>> I am working on ensuring that the Iceberg community stays
> >>>>>>>> within 1-2 versions of Flink, so that we can help provide more
> >>>>>>>> feedback or
> >>>>>>> contribute
> >>>>>>>> things that might make our ability to support multiple Flink
> >>>>> runtimes /
> >>>>>>>> versions with one project / codebase and minimal to no
> >>>>>>>> reflection
> >>>>> (our
> >>>>>>>> desired goal).
> >>>>>>>>
> >>>>>>>> If there’s anything I can do or any way I can be of assistance,
> >>>>> please
> >>>>>>>> don’t hesitate to reach out. Or find me on ASF slack 😀
> >>>>>>>>
> >>>>>>>> I greatly appreciate your general concern for the needs of
> >>>>> downstream
> >>>>>>>> connector integrators!
> >>>>>>>>
> >>>>>>>> Cheers
> >>>>>>>> Kyle Bendickson (GitHub: kbendick) Open Source Developer kyle
> >>>>>>>> [at] tabular [dot] io
> >>>>>>>>
> >>>>>>>> On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <t...@apache.org>
> >>>>> wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I see the stable core Flink API as a prerequisite for
> >>> modularity.
> >>>>> And
> >>>>>>>>> for connectors it is not just the source and sink API (source
> >>>>> being
> >>>>>>>>> stable as of 1.14), but everything that is required to build
> >>>>>>>>> and maintain a connector downstream, such as the test
> >>>>>>>>> utilities and infrastructure.
> >>>>>>>>>
> >>>>>>>>> Without the stable surface of core Flink, changes will leak
> >>>>>>>>> into downstream dependencies and force lock step updates.
> >>>>>>>>> Refactoring across N repos is more painful than a single
> >>>>>>>>> repo. Those with experience developing downstream of Flink
> >>>>>>>>> will know the pain, and
> >>>>>> that
> >>>>>>>>> isn't limited to connectors. I don't remember a Flink "minor
> >>>>> version"
> >>>>>>>>> update that was just a dependency version change and did not
> >>>>>>>>> force other downstream changes.
> >>>>>>>>>
> >>>>>>>>> Imagine a project with a complex set of dependencies. Let's
> >>>>>>>>> say
> >>>>> Flink
> >>>>>>>>> version A plus Flink reliant dependencies released by other
> >>>>> projects
> >>>>>>>>> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We
> >>>>>>>>> don't
> >>>>> want a
> >>>>>>>>> situation where we bump the core Flink version to B and
> >>>>>>>>> things
> >>>>> fall
> >>>>>>>>> apart (interface changes, utilities that were useful but not
> >>>>> public,
> >>>>>>>>> transitive dependencies etc.).
> >>>>>>>>>
> >>>>>>>>> The discussion here also highlights the benefits of keeping
> >>>>> certain
> >>>>>>>>> connectors outside Flink. Whether that is due to difference
> >>>>>>>>> in developer community, maturity of the connectors, their
> >>>>>>>>> specialized/limited usage etc. I would like to see that as a
> >>>>>>>>> sign
> >>>>> of
> >>>>>> a
> >>>>>>>>> growing ecosystem and most of the ideas that Arvid has put
> >>>>>>>>> forward would benefit further growth of the connector
> >> ecosystem.
> >>>>>>>>> As for keeping connectors within Apache Flink: I prefer that
> >>>>>>>>> as
> >>>>> the
> >>>>>>>>> path forward for "essential" connectors like FileSource,
> >>>>> KafkaSource,
> >>>>>>>>> ... And we can still achieve a more flexible and faster
> >>>>>>>>> release
> >>>>>> cycle.
> >>>>>>>>> Thanks,
> >>>>>>>>> Thomas
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <imj...@gmail.com>
> >>> wrote:
> >>>>>>>>>> Hi Konstantin,
> >>>>>>>>>>
> >>>>>>>>>>> the connectors need to be adopted and require at least
> >>>>>>>>>>> one
> >>>>>> release
> >>>>>>>> per
> >>>>>>>>>> Flink minor release.
> >>>>>>>>>> However, this will make the releases of connectors slower,
> >>> e.g.
> >>>>>>>> maintain
> >>>>>>>>>> features for multiple branches and release multiple
> >> branches.
> >>>>>>>>>> I think the main purpose of having an external connector
> >>>>> repository
> >>>>>>> is
> >>>>>>>> in
> >>>>>>>>>> order to have "faster releases of connectors"?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  From the perspective of CDC connector maintainers, the
> >>>>>>>>>> biggest
> >>>>>>>> advantage
> >>>>>>>>> of
> >>>>>>>>>> maintaining it outside of the Flink project is that:
> >>>>>>>>>> 1) we can have a more flexible and faster release cycle
> >>>>>>>>>> 2) we can be more liberal with committership for connector
> >>>>>>> maintainers
> >>>>>>>>>> which can also attract more committers to help the release.
> >>>>>>>>>>
> >>>>>>>>>> Personally, I think maintaining one connector repository
> >>>>>>>>>> under
> >>>>> the
> >>>>>>> ASF
> >>>>>>>>> may
> >>>>>>>>>> not have the above benefits.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Jark
> >>>>>>>>>>
> >>>>>>>>>> On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <
> >>>>> kna...@apache.org>
> >>>>>>>>> wrote:
> >>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>
> >>>>>>>>>>> regarding the stability of the APIs. I think everyone
> >>>>>>>>>>> agrees
> >>>>> that
> >>>>>>>>>>> connector APIs which are stable across minor versions
> >>>>>> (1.13->1.14)
> >>>>>>>> are
> >>>>>>>>> the
> >>>>>>>>>>> mid-term goal. But:
> >>>>>>>>>>>
> >>>>>>>>>>> a) These APIs are still quite young, and we shouldn't
> >>>>>>>>>>> make
> >>>>> them
> >>>>>>>> @Public
> >>>>>>>>>>> prematurely either.
> >>>>>>>>>>>
> >>>>>>>>>>> b) Isn't this *mostly* orthogonal to where the connector
> >>>>>>>>>>> code
> >>>>>>> lives?
> >>>>>>>>> Yes,
> >>>>>>>>>>> as long as there are breaking changes, the connectors
> >>>>>>>>>>> need to
> >>>>> be
> >>>>>>>>> adopted
> >>>>>>>>>>> and require at least one release per Flink minor release.
> >>>>>>>>>>> Documentation-wise this can be addressed via a
> >>>>>>>>>>> compatibility
> >>>>>> matrix
> >>>>>>>> for
> >>>>>>>>>>> each connector as Arvid suggested. IMO we shouldn't block
> >>>>>>>>>>> this
> >>>>>>> effort
> >>>>>>>>> on
> >>>>>>>>>>> the stability of the APIs.
> >>>>>>>>>>>
> >>>>>>>>>>> Cheers,
> >>>>>>>>>>>
> >>>>>>>>>>> Konstantin
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Oct 20, 2021 at 8:56 AM Jark Wu
> >>>>>>>>>>> <imj...@gmail.com>
> >>>>>> wrote:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think Thomas raised very good questions and would like
> >>>>>>>>>>>> to
> >>>>> know
> >>>>>>>> your
> >>>>>>>>>>>> opinions if we want to move connectors out of flink in
> >>>>>>>>>>>> this
> >>>>>>> version.
> >>>>>>>>>>>> (1) is the connector API already stable?
> >>>>>>>>>>>>> Separate releases would only make sense if the core
> >>>>>>>>>>>>> Flink
> >>>>>>> surface
> >>>>>>>> is
> >>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and
> >>>>>>>>>>>>> also
> >>>>> Beam),
> >>>>>>>>> that's
> >>>>>>>>>>>>> not the case currently. We should probably focus on
> >>>>> addressing
> >>>>>>> the
> >>>>>>>>>>>>> stability first, before splitting code. A success
> >>>>>>>>>>>>> criteria
> >>>>>> could
> >>>>>>>> be
> >>>>>>>>>>>>> that we are able to build Iceberg and Beam against
> >>>>>>>>>>>>> multiple
> >>>>>>> Flink
> >>>>>>>>>>>>> versions w/o the need to change code. The goal would
> >>>>>>>>>>>>> be
> >>>>> that
> >>>>>> no
> >>>>>>>>>>>>> connector breaks when we make changes to Flink core.
> >>>>>>>>>>>>> Until
> >>>>>>> that's
> >>>>>>>>> the
> >>>>>>>>>>>>> case, code separation creates a setup where 1+1 or N+1
> >>>>>>>> repositories
> >>>>>>>>>>>>> need to move lock step.
> >>>>>>>>>>>>  From another discussion thread [1], connector API is far
> >>>>>>>>>>>> from
> >>>>>>>> stable.
> >>>>>>>>>>>> Currently, it's hard to build connectors against
> >>>>>>>>>>>> multiple
> >>>>> Flink
> >>>>>>>>> versions.
> >>>>>>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and
> >>>>>>>>>>>> 1.13
> >>>>> ->
> >>>>>>> 1.14
> >>>>>>>>> and
> >>>>>>>>>>>>   maybe also in the future versions,  because Table
> >>>>>>>>>>>> related
> >>>>> APIs
> >>>>>>> are
> >>>>>>>>> still
> >>>>>>>>>>>> @PublicEvolving and new Sink API is still @Experimental.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> (2) Flink testability without connectors.
> >>>>>>>>>>>>> Flink w/o Kafka connector (and few others) isn't
> >>>>>>>>>>>>> viable. Testability of Flink was already brought up,
> >>>>>>>>>>>>> can we
> >>>>>>> really
> >>>>>>>>>>>>> certify a Flink core release without Kafka connector?
> >>>>>>>>>>>>> Maybe
> >>>>>>> those
> >>>>>>>>>>>>> connectors that are used in Flink e2e tests to
> >>>>>>>>>>>>> validate
> >>>>>>>>> functionality
> >>>>>>>>>>>>> of core Flink should not be broken out?
> >>>>>>>>>>>> This is a very good question. How can we guarantee the
> >>>>>>>>>>>> new
> >>>>>> Source
> >>>>>>>> and
> >>>>>>>>> Sink
> >>>>>>>>>>>> API are stable with only test implementation?
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Jark
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <
> >>>>>>> ches...@apache.org>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Could you clarify what release cadence you're thinking
> >>> of?
> >>>>>>> There's
> >>>>>>>>> quite
> >>>>>>>>>>>>> a big range that fits "more frequent than Flink"
> >>>>> (per-commit,
> >>>>>>>> daily,
> >>>>>>>>>>>>> weekly, bi-weekly, monthly, even bi-monthly).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 19/10/2021 14:15, Martijn Visser wrote:
> >>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think it would be a huge benefit if we can achieve
> >>>>>>>>>>>>>> more
> >>>>>>>> frequent
> >>>>>>>>>>>>> releases
> >>>>>>>>>>>>>> of connectors, which are not bound to the release
> >>>>>>>>>>>>>> cycle
> >>>>> of
> >>>>>>> Flink
> >>>>>>>>>>>> itself.
> >>>>>>>>>>>>> I
> >>>>>>>>>>>>>> agree that in order to get there, we need to have
> >>>>>>>>>>>>>> stable
> >>>>>>>>> interfaces
> >>>>>>>>>>>> which
> >>>>>>>>>>>>>> are trustworthy and reliable, so they can be safely
> >>>>>>>>>>>>>> used
> >>>>> by
> >>>>>>>> those
> >>>>>>>>>>>>>> connectors. I do think that work still needs to be
> >>>>>>>>>>>>>> done
> >>>>> on
> >>>>>>> those
> >>>>>>>>>>>>>> interfaces, but I am confident that we can get there
> >>>>> from a
> >>>>>>>> Flink
> >>>>>>>>>>>>>> perspective.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I am worried that we would not be able to achieve
> >>>>>>>>>>>>>> those
> >>>>>>> frequent
> >>>>>>>>>>>> releases
> >>>>>>>>>>>>>> of connectors if we are putting these connectors
> >>>>>>>>>>>>>> under
> >>>>> the
> >>>>>>>> Apache
> >>>>>>>>>>>>> umbrella,
> >>>>>>>>>>>>>> because that means that for each connector release
> >>>>>>>>>>>>>> we
> >>>>> have
> >>>>>> to
> >>>>>>>>> follow
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> Apache release creation process. This requires a lot
> >>>>>>>>>>>>>> of
> >>>>>> manual
> >>>>>>>>> steps
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> prohibits automation and I think it would be hard to
> >>>>> scale
> >>>>>> out
> >>>>>>>>>>>> frequent
> >>>>>>>>>>>>>> releases of connectors. I'm curious how others think
> >>>>>>>>>>>>>> this
> >>>>>>>>> challenge
> >>>>>>>>>>>> could
> >>>>>>>>>>>>>> be solved.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Martijn
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, 18 Oct 2021 at 22:22, Thomas Weise <
> >>>>> t...@apache.org>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>> Thanks for initiating this discussion.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> There are definitely a few things that are not
> >>>>>>>>>>>>>>> optimal
> >>>>> with
> >>>>>>> our
> >>>>>>>>>>>>>>> current management of connectors. I would not
> >>>>> necessarily
> >>>>>>>>>>>> characterize
> >>>>>>>>>>>>>>> it as a "mess" though. As the points raised so far
> >>>>> show, it
> >>>>>>>> isn't
> >>>>>>>>>>>> easy
> >>>>>>>>>>>>>>> to find a solution that balances competing
> >>>>>>>>>>>>>>> requirements
> >>>>> and
> >>>>>>>>> leads to
> >>>>>>>>>>>> a
> >>>>>>>>>>>>>>> net improvement.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> It would be great if we can find a setup that
> >>>>>>>>>>>>>>> allows for
> >>>>>>>>> connectors
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>> be released independently of core Flink and that
> >>>>>>>>>>>>>>> each
> >>>>>>> connector
> >>>>>>>>> can
> >>>>>>>>>>>> be
> >>>>>>>>>>>>>>> released separately. Flink already has separate
> >>>>>>>>>>>>>>> releases (flink-shaded), so that by itself isn't a
> >>> new thing.
> >>>>>>>>> Per-connector
> >>>>>>>>>>>>>>> releases would need to allow for more frequent
> >>>>>>>>>>>>>>> releases
> >>>>>>>> (without
> >>>>>>>>> the
> >>>>>>>>>>>>>>> baggage that a full Flink release comes with).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Separate releases would only make sense if the core
> >>>>> Flink
> >>>>>>>>> surface is
> >>>>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and
> >>>>>>>>>>>>>>> also
> >>>>>>> Beam),
> >>>>>>>>> that's
> >>>>>>>>>>>>>>> not the case currently. We should probably focus on
> >>>>>>> addressing
> >>>>>>>>> the
> >>>>>>>>>>>>>>> stability first, before splitting code. A success
> >>>>> criteria
> >>>>>>>> could
> >>>>>>>>> be
> >>>>>>>>>>>>>>> that we are able to build Iceberg and Beam against
> >>>>> multiple
> >>>>>>>> Flink
> >>>>>>>>>>>>>>> versions w/o the need to change code. The goal
> >>>>>>>>>>>>>>> would be
> >>>>>> that
> >>>>>>> no
> >>>>>>>>>>>>>>> connector breaks when we make changes to Flink core.
> >>>>> Until
> >>>>>>>>> that's the
> >>>>>>>>>>>>>>> case, code separation creates a setup where 1+1 or
> >>>>>>>>>>>>>>> N+1
> >>>>>>>>> repositories
> >>>>>>>>>>>>>>> need to move lock step.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Regarding some connectors being more important for
> >>>>>>>>>>>>>>> Flink
> >>>>>> than
> >>>>>>>>> others:
> >>>>>>>>>>>>>>> That's a fact. Flink w/o Kafka connector (and few
> >>>>> others)
> >>>>>>> isn't
> >>>>>>>>>>>>>>> viable. Testability of Flink was already brought
> >>>>>>>>>>>>>>> up,
> >>>>> can we
> >>>>>>>>> really
> >>>>>>>>>>>>>>> certify a Flink core release without Kafka
> >> connector?
> >>>>> Maybe
> >>>>>>>> those
> >>>>>>>>>>>>>>> connectors that are used in Flink e2e tests to
> >>>>>>>>>>>>>>> validate
> >>>>>>>>> functionality
> >>>>>>>>>>>>>>> of core Flink should not be broken out?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Finally, I think that the connectors that move into
> >>>>>> separate
> >>>>>>>>> repos
> >>>>>>>>>>>>>>> should remain part of the Apache Flink project.
> >>>>>>>>>>>>>>> Larger
> >>>>>>>>> organizations
> >>>>>>>>>>>>>>> tend to approve the use of and contribution to open
> >>>>> source
> >>>>>> at
> >>>>>>>> the
> >>>>>>>>>>>>>>> project level. Sometimes it is everything ASF. More
> >>>>> often
> >>>>>> it
> >>>>>>> is
> >>>>>>>>>>>>>>> "Apache Foo". It would be fatal to end up with a
> >>>>> patchwork
> >>>>>> of
> >>>>>>>>>>>> projects
> >>>>>>>>>>>>>>> with potentially different licenses and governance
> >>>>>>>>>>>>>>> to
> >>>>>> arrive
> >>>>>>>> at a
> >>>>>>>>>>>>>>> working Flink setup. This may mean we prioritize
> >>>>> usability
> >>>>>>> over
> >>>>>>>>>>>>>>> developer convenience, if that's in the best
> >>>>>>>>>>>>>>> interest of
> >>>>>>> Flink
> >>>>>>>>> as a
> >>>>>>>>>>>>>>> whole.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Thomas
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler <
> >>>>>>>>> ches...@apache.org
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>> Generally, the issues are reproducibility and
> >>> control.
> >>>>>>>>>>>>>>>> Stuffs completely broken on the Flink side for a
> >>> week?
> >>>>>> Well
> >>>>>>>>> then so
> >>>>>>>>>>>> are
> >>>>>>>>>>>>>>>> the connector repos.
> >>>>>>>>>>>>>>>> (As-is) You can't go back to a previous version of
> >>>>>>>>>>>>>>>> the
> >>>>>>>> snapshot.
> >>>>>>>>>>>> Which
> >>>>>>>>>>>>>>>> also means that checking out older commits can be
> >>>>>>> problematic
> >>>>>>>>>>>> because
> >>>>>>>>>>>>>>>> you'd still work against the latest snapshots, and
> >>>>>>>>>>>>>>>> they
> >>>>>> not
> >>>>>>> be
> >>>>>>>>>>>>>>>> compatible with each other.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 18/10/2021 15:22, Arvid Heise wrote:
> >>>>>>>>>>>>>>>>> I was actually betting on snapshots versions.
> >>>>>>>>>>>>>>>>> What are
> >>>>>> the
> >>>>>>>>> limits?
> >>>>>>>>>>>>>>>>> Obviously, we can only do a release of a 1.15
> >>>>> connector
> >>>>>>> after
> >>>>>>>>> 1.15
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>> release.
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>>
> >>>>>>>>>>> Konstantin Knauf
> >>>>>>>>>>>
> >>>>>>>>>>> https://urldefense.com/v3/__https://twitter.com/snntrable
> >>>>>>>>>>> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-
> >>>>>>>>>>> XjpYgX5MUy9M4$ [twitter[.]com]
> >>>>>>>>>>>
> >>>>>>>>>>> https://urldefense.com/v3/__https://github.com/knaufk__;!
> >>>>>>>>>>> !LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpY
> >>>>>>>>>>> gXyX8u50S$ [github[.]com]
> >>>>>>>>>>>
>
>

Re: [DISCUSS] Creating an external connector repository

Reply via email to