Re: [DISCUSS] Creating an external connector repository

Martijn Visser Thu, 13 Jan 2022 21:29:38 -0800

Hi everyone,

If you have any more comments or questions, do let me know. Else I'll open
up a vote thread next week.


Best regards,

Martijn

On Tue, 11 Jan 2022 at 20:13, Martijn Visser <mart...@ververica.com> wrote:

> Good question: we want to use the same setup as we currently have for
> Flink, so using the existing CI infrastructure.
>
> On Mon, 10 Jan 2022 at 11:19, Chesnay Schepler <ches...@apache.org> wrote:
>
>> What CI resources do you actually intend use? Asking since the ASF GHA
>> resources are afaik quite overloaded.
>>
>> On 05/01/2022 11:48, Martijn Visser wrote:
>> > Hi everyone,
>> >
>> > I wanted to summarise the email thread and see if there are any open
>> items
>> > that still need to be discussed, before we can finalise the discussion
>> in
>> > this email thread:
>> >
>> > 1. About having multi connectors in one repo or each connector in its
>> own
>> > repository
>> >
>> > As explained by @Arvid Heise <ar...@apache.org> we ultimately propose
>> to
>> > have a single repository per connector, which seems to be favoured in
>> the
>> > community.
>> >
>> > 2. About having the connector repositories under ASF or not.
>> >
>> > The consensus is that all connectors would remain under the ASF.
>> >
>> > I think we can categorise the questions or concerns that are brought
>> > forward as the following one:
>> >
>> > 3. How would we set up the testing?
>> >
>> > We need to make sure that we provide a proper testing framework, which
>> > means that we provide a public Source- and Sink testing framework. As
>> > mentioned extensively in the thread, we need to make sure that the
>> > necessary interfaces are properly annotated and at least
>> @PublicEvolving.
>> > This also includes the test infrastructure, like MiniCluster. For the
>> > latter, we don't know exactly yet how to balance having publicly
>> available
>> > test infrastructure vs being able to iterate inside of Flink, but we can
>> > all agree this has to be solved.
>> >
>> > For testing infrastructure, we would like to use Github Actions. In the
>> > current state, it probably makes sense for a connector repo to follow
>> the
>> > branching strategy of Flink. That will ensure a match between the
>> released
>> > connector and Flink version. This should change when all the Flink
>> > interfaces have stabilised so you can use a connector with multiple
>> Flink
>> > versions. That means that we should have a nightly build test for:
>> >
>> > - The `main` branch of the connector (which would be the unreleased
>> > version) against the `master` branch of Flink (the unreleased version of
>> > Flink).
>> > - Any supported `release-X.YY` branch of the connector against the
>> > `release-X.YY` branch of Flink.
>> >
>> > We should also have a smoke test E2E tests in Flink (one for DataStream,
>> > one for Table, one for SQL, one for Python) which loads all the
>> connectors
>> > and does an arbitrary test (post data on source, load into Flink, sink
>> > output and compare that output is as expected.
>> >
>> > 4. How would we integrate documentation?
>> >
>> > Documentation for a connector should probably end up in the connector
>> > repository. The Flink website should contain one entrance to all
>> connectors
>> > (so not the current approach where we have connectors per DataStream
>> API,
>> > Table API etc). Each connector documentation should end up as one menu
>> item
>> > in connectors, containing all necessary information for all DataStream,
>> > Table, SQL and Python implementations.
>> >
>> > 5. Which connectors should end up in the external connector repo?
>> >
>> > I'll open up a separate thread on this topic to have a parallel
>> discussion
>> > on that. We should reach consensus on both threads before we can move
>> > forward on this topic as a whole.
>> >
>> > Best regards,
>> >
>> > Martijn
>> >
>> > On Fri, 10 Dec 2021 at 04:47, Thomas Weise <t...@apache.org> wrote:
>> >
>> >> +1 for repo per connector from my side also
>> >>
>> >> Thanks for trying out the different approaches.
>> >>
>> >> Where would the common/infra pieces live? In a separate repository
>> >> with its own release?
>> >>
>> >> Thomas
>> >>
>> >> On Thu, Dec 9, 2021 at 12:42 PM Till Rohrmann <trohrm...@apache.org>
>> >> wrote:
>> >>> Sorry if I was a bit unclear. +1 for the single repo per connector
>> >> approach.
>> >>> Cheers,
>> >>> Till
>> >>>
>> >>> On Thu, Dec 9, 2021 at 5:41 PM Till Rohrmann <trohrm...@apache.org>
>> >> wrote:
>> >>>> +1 for the single repo approach.
>> >>>>
>> >>>> Cheers,
>> >>>> Till
>> >>>>
>> >>>> On Thu, Dec 9, 2021 at 3:54 PM Martijn Visser <mart...@ververica.com
>> >
>> >>>> wrote:
>> >>>>
>> >>>>> I also agree that it feels more natural to go with a repo for each
>> >>>>> individual connector. Each repository can be made available at
>> >>>>> flink-packages.org so users can find them, next to referring to
>> them
>> >> in
>> >>>>> documentation. +1 from my side.
>> >>>>>
>> >>>>> On Thu, 9 Dec 2021 at 15:38, Arvid Heise <ar...@apache.org> wrote:
>> >>>>>
>> >>>>>> Hi all,
>> >>>>>>
>> >>>>>> We tried out Chesnay's proposal and went with Option 2.
>> >> Unfortunately,
>> >>>>> we
>> >>>>>> experienced tough nuts to crack and feel like we hit a dead end:
>> >>>>>> - The main pain point with the outlined Frankensteinian connector
>> >> repo
>> >>>>> is
>> >>>>>> how to handle shared code / infra code. If we have it in some
>> >> <common>
>> >>>>>> branch, then we need to merge the common branch in the connector
>> >> branch
>> >>>>> on
>> >>>>>> update. However, it's unclear to me how improvements in the common
>> >>>>> branch
>> >>>>>> that naturally appear while working on a specific connector go back
>> >> into
>> >>>>>> the common branch. You can't use a pull request from your branch or
>> >> else
>> >>>>>> your connector code would poison the connector-less common branch.
>> >> So
>> >>>>> you
>> >>>>>> would probably manually copy the files over to a common branch and
>> >>>>> create a
>> >>>>>> PR branch for that.
>> >>>>>> - A weird solution could be to have the common branch as a
>> >> submodule in
>> >>>>> the
>> >>>>>> repo itself (if that's even possible). I'm sure that this setup
>> >> would
>> >>>>> blow
>> >>>>>> up the minds of all newcomers.
>> >>>>>> - Similarly, it's mandatory to have safeguards against code from
>> >>>>> connector
>> >>>>>> A poisoning connector B, common, or main. I had some similar setup
>> >> in
>> >>>>> the
>> >>>>>> past and code from two "distinct" branch types constantly swept
>> >> over.
>> >>>>>> - We could also say that we simply release <common> independently
>> >> and
>> >>>>> just
>> >>>>>> have a maven (SNAPSHOT) dependency on it. But that would create a
>> >> weird
>> >>>>>> flow if you need to change in common where you need to constantly
>> >> switch
>> >>>>>> branches back and forth.
>> >>>>>> - In general, Frankensteinian's approach is very switch intensive.
>> >> If
>> >>>>> you
>> >>>>>> maintain 3 connectors and need to fix 1 build stability each at the
>> >> same
>> >>>>>> time (quite common nowadays for some reason) and you have 2 review
>> >>>>> rounds,
>> >>>>>> you need to switch branches 9 times ignoring changes to common.
>> >>>>>>
>> >>>>>> Additionally, we still have the rather user/dev unfriendly main
>> >> that is
>> >>>>>> mostly empty. I'm also not sure we can generate an overview
>> >> README.md to
>> >>>>>> make it more friendly here because in theory every connector branch
>> >>>>> should
>> >>>>>> be based on main and we would get merge conflicts.
>> >>>>>>
>> >>>>>> I'd like to propose once again to go with individual repositories.
>> >>>>>> - The only downside that we discussed so far is that we have more
>> >>>>> initial
>> >>>>>> setup to do. Since we organically grow the number of
>> >>>>> connector/repositories
>> >>>>>> that load is quite distributed. We can offer templates after
>> >> finding a
>> >>>>> good
>> >>>>>> approach that can even be used by outside organizations.
>> >>>>>> - Regarding secrets, I think it's actually an advantage that the
>> >> Kafka
>> >>>>>> connector has no access to the AWS secrets. If there are secrets to
>> >> be
>> >>>>>> shared across connectors, we can and should use Azure's Variable
>> >> Groups
>> >>>>> (I
>> >>>>>> have used it in the past to share Nexus creds across repos). That
>> >> would
>> >>>>>> also make rotation easy.
>> >>>>>> - Working on different connectors would be rather easy as all
>> >> modern IDE
>> >>>>>> support multiple repo setups in the same project. You still need to
>> >> do
>> >>>>>> multiple releases in case you update common code (either accessed
>> >>>>> through
>> >>>>>> Nexus or git submodule) and you want to release your connector.
>> >>>>>> - There is no difference in respect to how many CI runs there in
>> >> both
>> >>>>>> approaches.
>> >>>>>> - Individual repositories also have the advantage of allowing
>> >> external
>> >>>>>> incubation. Let's assume someone builds connector A and hosts it in
>> >>>>> their
>> >>>>>> organization (very common setup). If they want to contribute the
>> >> code to
>> >>>>>> Flink, we could simply transfer the repository into ASF after
>> >> ensuring
>> >>>>>> Flink coding standards. Then we retain git history and Github
>> >> issues.
>> >>>>>> Is there any point that I'm missing?
>> >>>>>>
>> >>>>>> On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler <
>> >> ches...@apache.org>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> For sharing workflows we should be able to use composite actions.
>> >> We'd
>> >>>>>>> have the main definition files in the flink-connectors repo, that
>> >> we
>> >>>>>>> also need to tag/release, which other branches/repos can then
>> >> import.
>> >>>>>>> These are also versioned, so we don't have to worry about
>> >> accidentally
>> >>>>>>> breaking stuff.
>> >>>>>>> These could also be used to enforce certain standards / interfaces
>> >>>>> such
>> >>>>>>> that we can automate more things (e.g., integration into the Flink
>> >>>>>>> documentation).
>> >>>>>>>
>> >>>>>>> It is true that Option 2) and dedicated repositories share a lot
>> >> of
>> >>>>>>> properties. While I did say in an offline conversation that we in
>> >> that
>> >>>>>>> case might just as well use separate repositories, I'm not so sure
>> >>>>>>> anymore. One repo would make administration a bit easier, for
>> >> example
>> >>>>>>> secrets wouldn't have to be applied to each repo (we wouldn't want
>> >>>>>>> certain secrets to be set up organization-wide).
>> >>>>>>> I overall also like that one repo would present a single access
>> >> point;
>> >>>>>>> you can't "miss" a connector repo, and I would hope that having
>> >> it as
>> >>>>>>> one repo would nurture more collaboration between the connectors,
>> >>>>> which
>> >>>>>>> after all need to solve similar problems.
>> >>>>>>>
>> >>>>>>> It is a fair point that the branching model would be quite weird,
>> >> but
>> >>>>> I
>> >>>>>>> think that would subside pretty quickly.
>> >>>>>>>
>> >>>>>>> Personally I'd go with Option 2, and if that doesn't work out we
>> >> can
>> >>>>>>> still split the repo later on. (Which should then be a trivial
>> >> matter
>> >>>>> of
>> >>>>>>> copying all <connector>/* branches and renaming them).
>> >>>>>>>
>> >>>>>>> On 26/11/2021 12:47, Till Rohrmann wrote:
>> >>>>>>>> Hi Arvid,
>> >>>>>>>>
>> >>>>>>>> Thanks for updating this thread with the latest findings. The
>> >>>>> described
>> >>>>>>>> limitations for a single connector repo sound suboptimal to me.
>> >>>>>>>>
>> >>>>>>>> * Option 2. sounds as if we try to simulate multi connector
>> >> repos
>> >>>>>> inside
>> >>>>>>> of
>> >>>>>>>> a single repo. I also don't know how we would share code
>> >> between the
>> >>>>>>>> different branches (sharing infrastructure would probably be
>> >> easier
>> >>>>>>>> though). This seems to have the same limitations as dedicated
>> >> repos
>> >>>>>> with
>> >>>>>>>> the downside of having a not very intuitive branching model.
>> >>>>>>>> * Isn't option 1. kind of a degenerated version of option 2.
>> >> where
>> >>>>> we
>> >>>>>>> have
>> >>>>>>>> some unrelated code from other connectors in the individual
>> >>>>> connector
>> >>>>>>>> branches?
>> >>>>>>>> * Option 3. has the downside that someone creating a release
>> >> has to
>> >>>>>>> release
>> >>>>>>>> all connectors. This means that she either has to sync with the
>> >>>>>> different
>> >>>>>>>> connector maintainers or has to be able to release all
>> >> connectors on
>> >>>>>> her
>> >>>>>>>> own. We are already seeing in the Flink community that releases
>> >>>>> require
>> >>>>>>>> quite good communication/coordination between the different
>> >> people
>> >>>>>>> working
>> >>>>>>>> on different Flink components. Given our goals to make connector
>> >>>>>> releases
>> >>>>>>>> easier and more frequent, I think that coupling different
>> >> connector
>> >>>>>>>> releases might be counter-productive.
>> >>>>>>>>
>> >>>>>>>> To me it sounds not very practical to mainly use a mono
>> >> repository
>> >>>>> w/o
>> >>>>>>>> having some more advanced build infrastructure that, for
>> >> example,
>> >>>>>> allows
>> >>>>>>> to
>> >>>>>>>> have different git roots in different connector directories.
>> >> Maybe
>> >>>>> the
>> >>>>>>> mono
>> >>>>>>>> repo can be a catch all repository for connectors that want to
>> >> be
>> >>>>>>> released
>> >>>>>>>> in lock-step (Option 3.) with all other connectors the repo
>> >>>>> contains.
>> >>>>>> But
>> >>>>>>>> for connectors that get changed frequently, having a dedicated
>> >>>>>> repository
>> >>>>>>>> that allows independent releases sounds preferable to me.
>> >>>>>>>>
>> >>>>>>>> What utilities and infrastructure code do you intend to share?
>> >> Using
>> >>>>>> git
>> >>>>>>>> submodules can definitely be one option to share code. However,
>> >> it
>> >>>>>> might
>> >>>>>>>> also be ok to depend on flink-connector-common artifacts which
>> >> could
>> >>>>>> make
>> >>>>>>>> things easier. Where I am unsure is whether git submodules can
>> >> be
>> >>>>> used
>> >>>>>> to
>> >>>>>>>> share infrastructure code (e.g. the .github/workflows) because
>> >> you
>> >>>>> need
>> >>>>>>>> these files in the repo to trigger the CI infrastructure.
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> Till
>> >>>>>>>>
>> >>>>>>>> On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise <ar...@apache.org>
>> >>>>> wrote:
>> >>>>>>>>> Hi Brian,
>> >>>>>>>>>
>> >>>>>>>>> Thank you for sharing. I think your approach is very valid and
>> >> is
>> >>>>> in
>> >>>>>>> line
>> >>>>>>>>> with what I had in mind.
>> >>>>>>>>>
>> >>>>>>>>> Basically Pravega community aligns the connector releases with
>> >> the
>> >>>>>>> Pravega
>> >>>>>>>>>> mainline release
>> >>>>>>>>>>
>> >>>>>>>>> This certainly would mean that there is little value in
>> >> coupling
>> >>>>>>> connector
>> >>>>>>>>> versions. So it's making a good case for having separate
>> >> connector
>> >>>>>>> repos.
>> >>>>>>>>>
>> >>>>>>>>>> and maintains the connector with the latest 3 Flink
>> >> versions(CI
>> >>>>> will
>> >>>>>>>>>> publish snapshots for all these 3 branches)
>> >>>>>>>>>>
>> >>>>>>>>> I'd like to give connector devs a simple way to express to
>> >> which
>> >>>>> Flink
>> >>>>>>>>> versions the current branch is compatible. From there we can
>> >>>>> generate
>> >>>>>>> the
>> >>>>>>>>> compatibility matrix automatically and optionally also create
>> >>>>>> different
>> >>>>>>>>> releases per supported Flink version. Not sure if the latter is
>> >>>>> indeed
>> >>>>>>>>> better than having just one artifact that happens to run with
>> >>>>> multiple
>> >>>>>>>>> Flink versions. I guess it depends on what dependencies we are
>> >>>>>>> exposing. If
>> >>>>>>>>> the connector uses flink-connector-base, then we probably need
>> >>>>>> separate
>> >>>>>>>>> artifacts with poms anyways.
>> >>>>>>>>>
>> >>>>>>>>> Best,
>> >>>>>>>>>
>> >>>>>>>>> Arvid
>> >>>>>>>>>
>> >>>>>>>>> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian <b.z...@dell.com>
>> >>>>> wrote:
>> >>>>>>>>>> Hi Arvid,
>> >>>>>>>>>>
>> >>>>>>>>>> For branching model, the Pravega Flink connector has some
>> >>>>> experience
>> >>>>>>> what
>> >>>>>>>>>> I would like to share. Here[1][2] is the compatibility matrix
>> >> and
>> >>>>>> wiki
>> >>>>>>>>>> explaining the branching model and releases. Basically Pravega
>> >>>>>>> community
>> >>>>>>>>>> aligns the connector releases with the Pravega mainline
>> >> release,
>> >>>>> and
>> >>>>>>>>>> maintains the connector with the latest 3 Flink versions(CI
>> >> will
>> >>>>>>> publish
>> >>>>>>>>>> snapshots for all these 3 branches).
>> >>>>>>>>>> For example, recently we have 0.10.1 release[3], and in maven
>> >>>>> central
>> >>>>>>> we
>> >>>>>>>>>> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for
>> >>>>> 0.10.1
>> >>>>>>>>>> version[4].
>> >>>>>>>>>>
>> >>>>>>>>>> There are some alternatives. Another solution that we once
>> >>>>> discussed
>> >>>>>>> but
>> >>>>>>>>>> finally got abandoned is to have a independent version just
>> >> like
>> >>>>> the
>> >>>>>>>>>> current CDC connector, and then give a big compatibility
>> >> matrix to
>> >>>>>>> users.
>> >>>>>>>>>> We think it would be too confusing when the connector
>> >> develops. On
>> >>>>>> the
>> >>>>>>>>>> contrary, we can also do the opposite way to align with Flink
>> >>>>> version
>> >>>>>>> and
>> >>>>>>>>>> maintain several branches for different system version.
>> >>>>>>>>>>
>> >>>>>>>>>> I would say this is only a fairly-OK solution because it is a
>> >> bit
>> >>>>>>> painful
>> >>>>>>>>>> for maintainers as cherry-picks are very common and releases
>> >> would
>> >>>>>>>>> require
>> >>>>>>>>>> much work. However, if neither systems do not have a nice
>> >> backward
>> >>>>>>>>>> compatibility, there seems to be no comfortable solution to
>> >> the
>> >>>>> their
>> >>>>>>>>>> connector.
>> >>>>>>>>>>
>> >>>>>>>>>> [1]
>> >>>>> https://github.com/pravega/flink-connectors#compatibility-matrix
>> >>>>>>>>>> [2]
>> >>>>>>>>>>
>> >>
>> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector
>> >>>>>>>>>> [3]
>> >>>>> https://github.com/pravega/flink-connectors/releases/tag/v0.10.1
>> >>>>>>>>>> [4]
>> >> https://search.maven.org/search?q=pravega-connectors-flink
>> >>>>>>>>>> Best Regards,
>> >>>>>>>>>> Brian
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Internal Use - Confidential
>> >>>>>>>>>>
>> >>>>>>>>>> -----Original Message-----
>> >>>>>>>>>> From: Arvid Heise <ar...@apache.org>
>> >>>>>>>>>> Sent: Friday, November 19, 2021 4:12 PM
>> >>>>>>>>>> To: dev
>> >>>>>>>>>> Subject: Re: [DISCUSS] Creating an external connector
>> >> repository
>> >>>>>>>>>>
>> >>>>>>>>>> [EXTERNAL EMAIL]
>> >>>>>>>>>>
>> >>>>>>>>>> Hi everyone,
>> >>>>>>>>>>
>> >>>>>>>>>> we are currently in the process of setting up the
>> >> flink-connectors
>> >>>>>> repo
>> >>>>>>>>>> [1] for new connectors but we hit a wall that we currently
>> >> cannot
>> >>>>>> take:
>> >>>>>>>>>> branching model.
>> >>>>>>>>>> To reiterate the original motivation of the external connector
>> >>>>> repo:
>> >>>>>> We
>> >>>>>>>>>> want to decouple the release cycle of a connector with Flink.
>> >>>>>> However,
>> >>>>>>> if
>> >>>>>>>>>> we want to support semantic versioning in the connectors with
>> >> the
>> >>>>>>> ability
>> >>>>>>>>>> to introduce breaking changes through major version bumps and
>> >>>>> support
>> >>>>>>>>>> bugfixes on old versions, then we need release branches
>> >> similar to
>> >>>>>> how
>> >>>>>>>>>> Flink core operates.
>> >>>>>>>>>> Consider two connectors, let's call them kafka and hbase. We
>> >> have
>> >>>>>> kafka
>> >>>>>>>>> in
>> >>>>>>>>>> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config
>> >> option)
>> >>>>>> change
>> >>>>>>>>> and
>> >>>>>>>>>> hbase only on 1.0.A.
>> >>>>>>>>>>
>> >>>>>>>>>> Now our current assumption was that we can work with a
>> >> mono-repo
>> >>>>>> under
>> >>>>>>>>> ASF
>> >>>>>>>>>> (flink-connectors). Then, for release-branches, we found 3
>> >>>>> options:
>> >>>>>>>>>> 1. We would need to create some ugly mess with the cross
>> >> product
>> >>>>> of
>> >>>>>>>>>> connector and version: so you have kafka-release-1.0,
>> >>>>>>> kafka-release-1.1,
>> >>>>>>>>>> kafka-release-2.0, hbase-release-1.0. The main issue is not
>> >> the
>> >>>>>> amount
>> >>>>>>> of
>> >>>>>>>>>> branches (that's something that git can handle) but there the
>> >>>>> state
>> >>>>>> of
>> >>>>>>>>>> kafka is undefined in hbase-release-1.0. That's a call for
>> >>>>> desaster
>> >>>>>> and
>> >>>>>>>>>> makes releasing connectors very cumbersome (CI would only
>> >> execute
>> >>>>> and
>> >>>>>>>>>> publish hbase SNAPSHOTS on hbase-release-1.0).
>> >>>>>>>>>> 2. We could avoid the undefined state by having an empty
>> >> master
>> >>>>> and
>> >>>>>>> each
>> >>>>>>>>>> release branch really only holds the code of the connector.
>> >> But
>> >>>>>> that's
>> >>>>>>>>> also
>> >>>>>>>>>> not great: any user that looks at the repo and sees no
>> >> connector
>> >>>>>> would
>> >>>>>>>>>> assume that it's dead.
>> >>>>>>>>>> 3. We could have synced releases similar to the CDC connectors
>> >>>>> [2].
>> >>>>>>> That
>> >>>>>>>>>> means that if any connector introduces a breaking change, all
>> >>>>>>> connectors
>> >>>>>>>>>> get a new major. I find that quite confusing to a user if
>> >> hbase
>> >>>>> gets
>> >>>>>> a
>> >>>>>>>>> new
>> >>>>>>>>>> release without any change because kafka introduced a breaking
>> >>>>>> change.
>> >>>>>>>>>> To fully decouple release cycles and CI of connectors, we
>> >> could
>> >>>>> add
>> >>>>>>>>>> individual repositories under ASF (flink-connector-kafka,
>> >>>>>>>>>> flink-connector-hbase). Then we can apply the same branching
>> >>>>> model as
>> >>>>>>>>>> before. I quickly checked if there are precedences in the
>> >> apache
>> >>>>>>>>> community
>> >>>>>>>>>> for that approach and just by scanning alphabetically I found
>> >>>>> cordova
>> >>>>>>>>> with
>> >>>>>>>>>> 70 and couchdb with 77 apache repos respectively. So it
>> >> certainly
>> >>>>>> seems
>> >>>>>>>>>> like other projects approached our problem in that way and the
>> >>>>> apache
>> >>>>>>>>>> organization is okay with that. I currently expect max 20
>> >>>>> additional
>> >>>>>>>>> repos
>> >>>>>>>>>> for connectors and in the future 10 max each for formats and
>> >>>>>>> filesystems
>> >>>>>>>>> if
>> >>>>>>>>>> we would also move them out at some point in time. So we
>> >> would be
>> >>>>> at
>> >>>>>> a
>> >>>>>>>>>> total of 50 repos.
>> >>>>>>>>>>
>> >>>>>>>>>> Note for all options, we need to provide a compability matrix
>> >>>>> that we
>> >>>>>>> aim
>> >>>>>>>>>> to autogenerate.
>> >>>>>>>>>>
>> >>>>>>>>>> Now for the potential downsides that we internally discussed:
>> >>>>>>>>>> - How can we ensure common infra structure code, utilties, and
>> >>>>>> quality?
>> >>>>>>>>>> I propose to add a flink-connector-common that contains all
>> >> these
>> >>>>>>> things
>> >>>>>>>>>> and is added as a git submodule/subtree to the repos.
>> >>>>>>>>>> - Do we implicitly discourage connector developers to maintain
>> >>>>> more
>> >>>>>>> than
>> >>>>>>>>>> one connector with a fragmented code base?
>> >>>>>>>>>> That is certainly a risk. However, I currently also see few
>> >> devs
>> >>>>>>> working
>> >>>>>>>>>> on more than one connector. However, it may actually help
>> >> keeping
>> >>>>> the
>> >>>>>>>>> devs
>> >>>>>>>>>> that maintain a specific connector on the hook. We could use
>> >>>>> github
>> >>>>>>>>> issues
>> >>>>>>>>>> to track bugs and feature requests and a dev can focus his
>> >> limited
>> >>>>>> time
>> >>>>>>>>> on
>> >>>>>>>>>> getting that one connector right.
>> >>>>>>>>>>
>> >>>>>>>>>> So WDYT? Compared to some intermediate suggestions with split
>> >>>>> repos,
>> >>>>>>> the
>> >>>>>>>>>> big difference is that everything remains under Apache
>> >> umbrella
>> >>>>> and
>> >>>>>> the
>> >>>>>>>>>> Flink community.
>> >>>>>>>>>>
>> >>>>>>>>>> [1]
>> >>>>>>>>>>
>> >>
>> https://urldefense.com/v3/__https://github.com/apache/flink-connectors__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4$
>> >>>>>>>>>> [github[.]com] [2]
>> >>>>>>>>>>
>> >>
>> https://urldefense.com/v3/__https://github.com/ververica/flink-cdc-connectors/__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzgoPGA8$
>> >>>>>>>>>> [github[.]com]
>> >>>>>>>>>>
>> >>>>>>>>>> On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <ar...@apache.org
>> >>>>>> wrote:
>> >>>>>>>>>>> Hi everyone,
>> >>>>>>>>>>>
>> >>>>>>>>>>> I created the flink-connectors repo [1] to advance the
>> >> topic. We
>> >>>>>> would
>> >>>>>>>>>>> create a proof-of-concept in the next few weeks as a special
>> >>>>> branch
>> >>>>>>>>>>> that I'd then use for discussions. If the community agrees
>> >> with
>> >>>>> the
>> >>>>>>>>>>> approach, that special branch will become the master. If
>> >> not, we
>> >>>>> can
>> >>>>>>>>>>> reiterate over it or create competing POCs.
>> >>>>>>>>>>>
>> >>>>>>>>>>> If someone wants to try things out in parallel, just make
>> >> sure
>> >>>>> that
>> >>>>>>>>>>> you are not accidentally pushing POCs to the master.
>> >>>>>>>>>>>
>> >>>>>>>>>>> As a reminder: We will not move out any current connector
>> >> from
>> >>>>> Flink
>> >>>>>>>>>>> at this point in time, so everything in Flink will remain as
>> >> is
>> >>>>> and
>> >>>>>> be
>> >>>>>>>>>>> maintained there.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best,
>> >>>>>>>>>>>
>> >>>>>>>>>>> Arvid
>> >>>>>>>>>>>
>> >>>>>>>>>>> [1]
>> >>>>>>>>>>>
>> >> https://urldefense.com/v3/__https://github.com/apache/flink-connectors
>> >> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4
>> >>>>>>>>>>> $ [github[.]com]
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann <
>> >>>>> trohrm...@apache.org
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Hi everyone,
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>   From the discussion, it seems to me that we have different
>> >>>>>> opinions
>> >>>>>>>>>>>> whether to have an ASF umbrella repository or to host them
>> >>>>> outside
>> >>>>>> of
>> >>>>>>>>>>>> the ASF. It also seems that this is not really the problem
>> >> to
>> >>>>>> solve.
>> >>>>>>>>>>>> Since there are many good arguments for either approach, we
>> >>>>> could
>> >>>>>>>>>>>> simply start with an ASF umbrella repository and see how
>> >> people
>> >>>>>> adopt
>> >>>>>>>>>>>> it. If the individual connectors cannot move fast enough or
>> >> if
>> >>>>>> people
>> >>>>>>>>>>>> prefer to not buy into the more heavy-weight ASF processes,
>> >> then
>> >>>>>> they
>> >>>>>>>>>>>> can host the code also somewhere else. We simply need to
>> >> make
>> >>>>> sure
>> >>>>>>>>>>>> that these connectors are discoverable (e.g. via
>> >>>>> flink-packages).
>> >>>>>>>>>>>> The more important problem seems to be to provide common
>> >> tooling
>> >>>>>>>>>>>> (testing, infrastructure, documentation) that can easily be
>> >>>>> reused.
>> >>>>>>>>>>>> Similarly, it has become clear that the Flink community
>> >> needs to
>> >>>>>>>>>>>> improve on providing stable APIs. I think it is not
>> >> realistic to
>> >>>>>>>>>>>> first complete these tasks before starting to move
>> >> connectors to
>> >>>>>>>>>>>> dedicated repositories. As Stephan said, creating a
>> >> connector
>> >>>>>>>>>>>> repository will force us to pay more attention to API
>> >> stability
>> >>>>> and
>> >>>>>>>>>>>> also to think about which testing tools are required.
>> >> Hence, I
>> >>>>>>>>>>>> believe that starting to add connectors to a different
>> >>>>> repository
>> >>>>>>>>>>>> than apache/flink will help improve our connector tooling
>> >>>>>> (declaring
>> >>>>>>>>>>>> testing classes as public, creating a common test utility
>> >> repo,
>> >>>>>>>>>>>> creating a repo
>> >>>>>>>>>>>> template) and vice versa. Hence, I like Arvid's proposed
>> >>>>> process as
>> >>>>>>>>>>>> it will start kicking things off w/o letting this effort
>> >> fizzle
>> >>>>>> out.
>> >>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>> Till
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen <
>> >> se...@apache.org
>> >>>>>>>>> wrote:
>> >>>>>>>>>>>>> Thank you all, for the nice discussion!
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>   From my point of view, I very much like the idea of
>> >> putting
>> >>>>>>>>>>>>> connectors
>> >>>>>>>>>>>> in a
>> >>>>>>>>>>>>> separate repository. But I would argue it should be part of
>> >>>>> Apache
>> >>>>>>>>>>>> Flink,
>> >>>>>>>>>>>>> similar to flink-statefun, flink-ml, etc.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I share many of the reasons for that:
>> >>>>>>>>>>>>>     - As argued many times, reduces complexity of the Flink
>> >>>>> repo,
>> >>>>>>>>>>>> increases
>> >>>>>>>>>>>>> response times of CI, etc.
>> >>>>>>>>>>>>>     - Much lower barrier of contribution, because an
>> >> unstable
>> >>>>>>>>>>>>> connector
>> >>>>>>>>>>>> would
>> >>>>>>>>>>>>> not de-stabilize the whole build. Of course, we would need
>> >> to
>> >>>>> make
>> >>>>>>>>>>>>> sure
>> >>>>>>>>>>>> we
>> >>>>>>>>>>>>> set this up the right way, with connectors having
>> >> individual CI
>> >>>>>>>>>>>>> runs,
>> >>>>>>>>>>>> build
>> >>>>>>>>>>>>> status, etc. But it certainly seems possible.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I would argue some points a bit different than some cases
>> >> made
>> >>>>>>>>> before:
>> >>>>>>>>>>>>> (a) I believe the separation would increase connector
>> >>>>> stability.
>> >>>>>>>>>>>> Because it
>> >>>>>>>>>>>>> really forces us to work with the connectors against the
>> >> APIs
>> >>>>> like
>> >>>>>>>>>>>>> any external developer. A mono repo is somehow the wrong
>> >> thing
>> >>>>> if
>> >>>>>>>>>>>>> you in practice want to actually guarantee stable internal
>> >>>>> APIs at
>> >>>>>>>>>> some layer.
>> >>>>>>>>>>>>> Because the mono repo makes it easy to just change
>> >> something on
>> >>>>>>>>>>>>> both
>> >>>>>>>>>>>> sides
>> >>>>>>>>>>>>> of the API (provider and consumer) seamlessly.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Major refactorings in Flink need to keep all connector API
>> >>>>>>>>>>>>> contracts intact, or we need to have a new version of the
>> >>>>>> connector
>> >>>>>>>>>> API.
>> >>>>>>>>>>>>> (b) We may even be able to go towards more lightweight and
>> >>>>>>>>>>>>> automated releases over time, even if we stay in Apache
>> >> Flink
>> >>>>> with
>> >>>>>>>>>> that repo.
>> >>>>>>>>>>>>> This isn't yet fully aligned with the Apache release
>> >> policies,
>> >>>>>> yet,
>> >>>>>>>>>>>>> but there are board discussions about whether there can be
>> >>>>>>>>>>>>> bot-triggered releases (by dependabot) and how that could
>> >> fit
>> >>>>> into
>> >>>>>>>>>> the Apache process.
>> >>>>>>>>>>>>> This doesn't seem to be quite there just yet, but seeing
>> >> that
>> >>>>>> those
>> >>>>>>>>>>>> start
>> >>>>>>>>>>>>> is a good sign, and there is a good chance we can do some
>> >>>>> things
>> >>>>>>>>>> there.
>> >>>>>>>>>>>>> I am not sure whether we should let bots trigger releases,
>> >>>>> because
>> >>>>>>>>>>>>> a
>> >>>>>>>>>>>> final
>> >>>>>>>>>>>>> human look at things isn't a bad thing, especially given
>> >> the
>> >>>>>>>>>>>>> popularity
>> >>>>>>>>>>>> of
>> >>>>>>>>>>>>> software supply chain attacks recently.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I do share Chesnay's concerns about complexity in tooling,
>> >>>>> though.
>> >>>>>>>>>>>>> Both release tooling and test tooling. They are not
>> >>>>> incompatible
>> >>>>>>>>>>>>> with that approach, but they are a task we need to tackle
>> >>>>> during
>> >>>>>>>>>>>>> this change which will add additional work.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise <
>> >> ar...@apache.org
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>>>>> Hi folks,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I think some questions came up and I'd like to address the
>> >>>>>>>>>>>>>> question of
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>>> timing.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Could you clarify what release cadence you're thinking of?
>> >>>>>>>>>>>>>> There's
>> >>>>>>>>>>>> quite
>> >>>>>>>>>>>>>>> a big range that fits "more frequent than Flink"
>> >> (per-commit,
>> >>>>>>>>>>>>>>> daily, weekly, bi-weekly, monthly, even bi-monthly).
>> >>>>>>>>>>>>>> The short answer is: as often as needed:
>> >>>>>>>>>>>>>> - If there is a CVE in a dependency and we need to bump
>> >> it -
>> >>>>>>>>>>>>>> release immediately.
>> >>>>>>>>>>>>>> - If there is a new feature merged, release soonish. We
>> >> may
>> >>>>>>>>>>>>>> collect a
>> >>>>>>>>>>>> few
>> >>>>>>>>>>>>>> successive features before a release.
>> >>>>>>>>>>>>>> - If there is a bugfix, release immediately or soonish
>> >>>>> depending
>> >>>>>>>>>>>>>> on
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>> severity and if there are workarounds available.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> We should not limit ourselves; the whole idea of
>> >> independent
>> >>>>>>>>>>>>>> releases
>> >>>>>>>>>>>> is
>> >>>>>>>>>>>>>> exactly that you release as needed. There is no release
>> >>>>> planning
>> >>>>>>>>>>>>>> or anything needed, you just go with a release as if it
>> >> was an
>> >>>>>>>>>>>>>> external artifact.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> (1) is the connector API already stable?
>> >>>>>>>>>>>>>>>   From another discussion thread [1], connector API is far
>> >>>>> from
>> >>>>>>>>>>>> stable.
>> >>>>>>>>>>>>>>> Currently, it's hard to build connectors against multiple
>> >>>>> Flink
>> >>>>>>>>>>>>> versions.
>> >>>>>>>>>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and
>> >> 1.13
>> >>>>> ->
>> >>>>>>>>>>>>>>> 1.14
>> >>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>    maybe also in the future versions,  because Table
>> >> related
>> >>>>> APIs
>> >>>>>>>>>>>>>>> are
>> >>>>>>>>>>>>> still
>> >>>>>>>>>>>>>>> @PublicEvolving and new Sink API is still @Experimental.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> The question is: what is stable in an evolving system? We
>> >>>>>>>>>>>>>> recently discovered that the old SourceFunction needed to
>> >> be
>> >>>>>>>>>>>>>> refined such that cancellation works correctly [1]. So
>> >> that
>> >>>>>>>>>>>>>> interface is in Flink since
>> >>>>>>>>>>>> 7
>> >>>>>>>>>>>>>> years, heavily used also outside, and we still had to
>> >> change
>> >>>>> the
>> >>>>>>>>>>>> contract
>> >>>>>>>>>>>>>> in a way that I'd expect any implementer to recheck their
>> >>>>>>>>>>>> implementation.
>> >>>>>>>>>>>>>> It might not be necessary to change anything and you can
>> >>>>> probably
>> >>>>>>>>>>>> change
>> >>>>>>>>>>>>>> the the code for all Flink versions but still, the
>> >> interface
>> >>>>> was
>> >>>>>>>>>>>>>> not
>> >>>>>>>>>>>>> stable
>> >>>>>>>>>>>>>> in the closest sense.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> If we focus just on API changes on the unified interfaces,
>> >>>>> then
>> >>>>>>>>>>>>>> we
>> >>>>>>>>>>>> expect
>> >>>>>>>>>>>>>> one more change to Sink API to support compaction. For
>> >> Table
>> >>>>> API,
>> >>>>>>>>>>>> there
>> >>>>>>>>>>>>>> will most likely also be some changes in 1.15. So we could
>> >>>>> wait
>> >>>>>>>>>>>>>> for
>> >>>>>>>>>>>> 1.15.
>> >>>>>>>>>>>>>> But I'm questioning if that's really necessary because we
>> >> will
>> >>>>>>>>>>>>>> add
>> >>>>>>>>>>>> more
>> >>>>>>>>>>>>>> functionality beyond 1.15 without breaking API. For
>> >> example,
>> >>>>> we
>> >>>>>>>>>>>>>> may
>> >>>>>>>>>>>> add
>> >>>>>>>>>>>>>> more unified connector metrics. If you want to use it in
>> >> your
>> >>>>>>>>>>>> connector,
>> >>>>>>>>>>>>>> you have to support multiple Flink versions anyhow. So
>> >> rather
>> >>>>>>>>>>>>>> then
>> >>>>>>>>>>>>> focusing
>> >>>>>>>>>>>>>> the discussion on "when is stuff stable", I'd rather
>> >> focus on
>> >>>>>>>>>>>>>> "how
>> >>>>>>>>>>>> can we
>> >>>>>>>>>>>>>> support building connectors against multiple Flink
>> >> versions"
>> >>>>> and
>> >>>>>>>>>>>>>> make
>> >>>>>>>>>>>> it
>> >>>>>>>>>>>>> as
>> >>>>>>>>>>>>>> painless as possible.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Chesnay pointed out to use different branches for
>> >> different
>> >>>>> Flink
>> >>>>>>>>>>>>> versions
>> >>>>>>>>>>>>>> which sounds like a good suggestion. With a mono-repo, we
>> >>>>> can't
>> >>>>>>>>>>>>>> use branches differently anyways (there is no way to have
>> >>>>> release
>> >>>>>>>>>>>>>> branches
>> >>>>>>>>>>>>> per
>> >>>>>>>>>>>>>> connector without chaos). In these branches, we could
>> >> provide
>> >>>>>>>>>>>>>> shims to simulate future features in older Flink versions
>> >> such
>> >>>>>>>>>>>>>> that code-wise,
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>> source code of a specific connector may not diverge
>> >> (much).
>> >>>>> For
>> >>>>>>>>>>>> example,
>> >>>>>>>>>>>>> to
>> >>>>>>>>>>>>>> register unified connector metrics, we could simulate the
>> >>>>> current
>> >>>>>>>>>>>>> approach
>> >>>>>>>>>>>>>> also in some utility package of the mono-repo.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I see the stable core Flink API as a prerequisite for
>> >>>>> modularity.
>> >>>>>>>>>>>>>> And
>> >>>>>>>>>>>>>>> for connectors it is not just the source and sink API
>> >> (source
>> >>>>>>>>>>>>>>> being stable as of 1.14), but everything that is
>> >> required to
>> >>>>>>>>>>>>>>> build and maintain a connector downstream, such as the
>> >> test
>> >>>>>>>>>>>>>>> utilities and infrastructure.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> That is a very fair point. I'm actually surprised to see
>> >> that
>> >>>>>>>>>>>>>> MiniClusterWithClientResource is not public. I see it
>> >> being
>> >>>>> used
>> >>>>>>>>>>>>>> in
>> >>>>>>>>>>>> all
>> >>>>>>>>>>>>>> connectors, especially outside of Flink. I fear that as
>> >> long
>> >>>>> as
>> >>>>>>>>>>>>>> we do
>> >>>>>>>>>>>> not
>> >>>>>>>>>>>>>> have connectors outside, we will not properly annotate and
>> >>>>>>>>>>>>>> maintain
>> >>>>>>>>>>>> these
>> >>>>>>>>>>>>>> utilties in a classic hen-and-egg-problem. I will outline
>> >> an
>> >>>>> idea
>> >>>>>>>>>>>>>> at
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>> end.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> the connectors need to be adopted and require at least
>> >> one
>> >>>>>>>>>>>>>>> release
>> >>>>>>>>>>>> per
>> >>>>>>>>>>>>>>> Flink minor release.
>> >>>>>>>>>>>>>>> However, this will make the releases of connectors
>> >> slower,
>> >>>>> e.g.
>> >>>>>>>>>>>>> maintain
>> >>>>>>>>>>>>>>> features for multiple branches and release multiple
>> >> branches.
>> >>>>>>>>>>>>>>> I think the main purpose of having an external connector
>> >>>>>>>>>>>>>>> repository
>> >>>>>>>>>>>> is
>> >>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>> order to have "faster releases of connectors"?
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Imagine a project with a complex set of dependencies.
>> >> Let's
>> >>>>> say
>> >>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>> version A plus Flink reliant dependencies released by
>> >> other
>> >>>>>>>>>>>>>>> projects (Flink-external connectors, Beam, Iceberg, Hudi,
>> >>>>> ..).
>> >>>>>>>>>>>>>>> We don't want
>> >>>>>>>>>>>> a
>> >>>>>>>>>>>>>>> situation where we bump the core Flink version to B and
>> >>>>> things
>> >>>>>>>>>>>>>>> fall apart (interface changes, utilities that were
>> >> useful but
>> >>>>>>>>>>>>>>> not public, transitive dependencies etc.).
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Yes, that's why I wanted to automate the processes more
>> >> which
>> >>>>> is
>> >>>>>>>>>>>>>> not
>> >>>>>>>>>>>> that
>> >>>>>>>>>>>>>> easy under ASF. Maybe we automate the source provision
>> >> across
>> >>>>>>>>>>>> supported
>> >>>>>>>>>>>>>> versions and have 1 vote thread for all versions of a
>> >>>>> connector?
>> >>>>>>>>>>>>>>   From the perspective of CDC connector maintainers, the
>> >>>>> biggest
>> >>>>>>>>>>>> advantage
>> >>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>> maintaining it outside of the Flink project is that:
>> >>>>>>>>>>>>>>> 1) we can have a more flexible and faster release cycle
>> >>>>>>>>>>>>>>> 2) we can be more liberal with committership for
>> >> connector
>> >>>>>>>>>>>> maintainers
>> >>>>>>>>>>>>>>> which can also attract more committers to help the
>> >> release.
>> >>>>>>>>>>>>>>> Personally, I think maintaining one connector repository
>> >>>>> under
>> >>>>>>>>>>>>>>> the
>> >>>>>>>>>>>> ASF
>> >>>>>>>>>>>>>> may
>> >>>>>>>>>>>>>>> not have the above benefits.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Yes, I also feel that ASF is too restrictive for our
>> >> needs.
>> >>>>> But
>> >>>>>>>>>>>>>> it
>> >>>>>>>>>>>> feels
>> >>>>>>>>>>>>>> like there are too many that see it differently and I
>> >> think we
>> >>>>>>>>>>>>>> need
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> (2) Flink testability without connectors.
>> >>>>>>>>>>>>>>> This is a very good question. How can we guarantee the
>> >> new
>> >>>>>>>>>>>>>>> Source
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>>> Sink
>> >>>>>>>>>>>>>>> API are stable with only test implementation?
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> We can't and shouldn't. Since the connector repo is
>> >> managed by
>> >>>>>>>>>>>>>> Flink,
>> >>>>>>>>>>>> a
>> >>>>>>>>>>>>>> Flink release manager needs to check if the Flink
>> >> connectors
>> >>>>> are
>> >>>>>>>>>>>> actually
>> >>>>>>>>>>>>>> working prior to creating an RC. That's similar to how
>> >>>>>>>>>>>>>> flink-shaded
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>>> flink core are related.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> So here is one idea that I had to get things rolling. We
>> >> are
>> >>>>>>>>>>>>>> going to address the external repo iteratively without
>> >>>>>>>>>>>>>> compromising what we
>> >>>>>>>>>>>>> already
>> >>>>>>>>>>>>>> have:
>> >>>>>>>>>>>>>> 1.Phase, add new contributions to external repo. We use
>> >> that
>> >>>>> time
>> >>>>>>>>>>>>>> to
>> >>>>>>>>>>>>> setup
>> >>>>>>>>>>>>>> infra accordingly and optimize release processes. We will
>> >>>>>>>>>>>>>> identify
>> >>>>>>>>>>>> test
>> >>>>>>>>>>>>>> utilities that are not yet public/stable and fix that.
>> >>>>>>>>>>>>>> 2.Phase, add ports to the new unified interfaces of
>> >> existing
>> >>>>>>>>>>>> connectors.
>> >>>>>>>>>>>>>> That requires a previous Flink release to make utilities
>> >>>>> stable.
>> >>>>>>>>>>>>>> Keep
>> >>>>>>>>>>>> old
>> >>>>>>>>>>>>>> interfaces in flink-core.
>> >>>>>>>>>>>>>> 3.Phase, remove old interfaces in flink-core of some
>> >>>>> connectors
>> >>>>>>>>>>>>>> (tbd
>> >>>>>>>>>>>> at a
>> >>>>>>>>>>>>>> later point).
>> >>>>>>>>>>>>>> 4.Phase, optionally move all remaining connectors (tbd at
>> >> a
>> >>>>> later
>> >>>>>>>>>>>> point).
>> >>>>>>>>>>>>>> I'd envision having ~3 months between the starting the
>> >>>>> different
>> >>>>>>>>>>>> phases.
>> >>>>>>>>>>>>>> WDYT?
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>
>> >>>>>> https://urldefense.com/v3/__https://issues.apache.org/jira/browse
>> >>>>> /FLINK-23527__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgd
>> >>>>>>>>>>>>>> ke_-XjpYgX2sIvAP4$ [issues[.]apache[.]org]
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson <
>> >>>>> k...@tabular.io
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>> Hi all,
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> My name is Kyle and I’m an open source developer
>> >> primarily
>> >>>>>>>>>>>>>>> focused
>> >>>>>>>>>>>> on
>> >>>>>>>>>>>>>>> Apache Iceberg.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> I’m happy to help clarify or elaborate on any aspect of
>> >> our
>> >>>>>>>>>>>> experience
>> >>>>>>>>>>>>>>> working on a relatively decoupled connector that is
>> >>>>> downstream
>> >>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>> pretty
>> >>>>>>>>>>>>>>> popular.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> I’d also love to be able to contribute or assist in any
>> >> way I
>> >>>>>>>>> can.
>> >>>>>>>>>>>>>>> I don’t mean to thread jack, but are there any meetings
>> >> or
>> >>>>>>>>>>>>>>> community
>> >>>>>>>>>>>>> sync
>> >>>>>>>>>>>>>>> ups, specifically around the connector APIs, that I might
>> >>>>> join
>> >>>>>>>>>>>>>>> / be
>> >>>>>>>>>>>>>> invited
>> >>>>>>>>>>>>>>> to?
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> I did want to add that even though I’ve experienced some
>> >> of
>> >>>>> the
>> >>>>>>>>>>>>>>> pain
>> >>>>>>>>>>>>>> points
>> >>>>>>>>>>>>>>> of integrating with an evolving system / API (catalog
>> >> support
>> >>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>> generally
>> >>>>>>>>>>>>>>> speaking pretty new everywhere really in this space), I
>> >> also
>> >>>>>>>>>>>>>>> agree personally that you shouldn’t slow down development
>> >>>>>>>>>>>>>>> velocity too
>> >>>>>>>>>>>> much
>> >>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>> the sake of external connector. Getting to a performant
>> >> and
>> >>>>>>>>>>>>>>> stable
>> >>>>>>>>>>>>> place
>> >>>>>>>>>>>>>>> should be the primary goal, and slowing that down to
>> >> support
>> >>>>>>>>>>>> stragglers
>> >>>>>>>>>>>>>>> will (in my personal opinion) always be a losing game.
>> >> Some
>> >>>>>>>>>>>>>>> folks
>> >>>>>>>>>>>> will
>> >>>>>>>>>>>>>>> simply stay behind on versions regardless until they
>> >> have to
>> >>>>>>>>>>>> upgrade.
>> >>>>>>>>>>>>>>> I am working on ensuring that the Iceberg community stays
>> >>>>>>>>>>>>>>> within 1-2 versions of Flink, so that we can help provide
>> >>>>> more
>> >>>>>>>>>>>>>>> feedback or
>> >>>>>>>>>>>>>> contribute
>> >>>>>>>>>>>>>>> things that might make our ability to support multiple
>> >> Flink
>> >>>>>>>>>>>> runtimes /
>> >>>>>>>>>>>>>>> versions with one project / codebase and minimal to no
>> >>>>>>>>>>>>>>> reflection
>> >>>>>>>>>>>> (our
>> >>>>>>>>>>>>>>> desired goal).
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> If there’s anything I can do or any way I can be of
>> >>>>> assistance,
>> >>>>>>>>>>>> please
>> >>>>>>>>>>>>>>> don’t hesitate to reach out. Or find me on ASF slack 😀
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> I greatly appreciate your general concern for the needs
>> >> of
>> >>>>>>>>>>>> downstream
>> >>>>>>>>>>>>>>> connector integrators!
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Cheers
>> >>>>>>>>>>>>>>> Kyle Bendickson (GitHub: kbendick) Open Source Developer
>> >> kyle
>> >>>>>>>>>>>>>>> [at] tabular [dot] io
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise <
>> >>>>> t...@apache.org>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>> Hi,
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> I see the stable core Flink API as a prerequisite for
>> >>>>>>>>>> modularity.
>> >>>>>>>>>>>> And
>> >>>>>>>>>>>>>>>> for connectors it is not just the source and sink API
>> >>>>> (source
>> >>>>>>>>>>>> being
>> >>>>>>>>>>>>>>>> stable as of 1.14), but everything that is required to
>> >> build
>> >>>>>>>>>>>>>>>> and maintain a connector downstream, such as the test
>> >>>>>>>>>>>>>>>> utilities and infrastructure.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Without the stable surface of core Flink, changes will
>> >> leak
>> >>>>>>>>>>>>>>>> into downstream dependencies and force lock step
>> >> updates.
>> >>>>>>>>>>>>>>>> Refactoring across N repos is more painful than a single
>> >>>>>>>>>>>>>>>> repo. Those with experience developing downstream of
>> >> Flink
>> >>>>>>>>>>>>>>>> will know the pain, and
>> >>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>> isn't limited to connectors. I don't remember a Flink
>> >> "minor
>> >>>>>>>>>>>> version"
>> >>>>>>>>>>>>>>>> update that was just a dependency version change and
>> >> did not
>> >>>>>>>>>>>>>>>> force other downstream changes.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Imagine a project with a complex set of dependencies.
>> >> Let's
>> >>>>>>>>>>>>>>>> say
>> >>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>> version A plus Flink reliant dependencies released by
>> >> other
>> >>>>>>>>>>>> projects
>> >>>>>>>>>>>>>>>> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We
>> >>>>>>>>>>>>>>>> don't
>> >>>>>>>>>>>> want a
>> >>>>>>>>>>>>>>>> situation where we bump the core Flink version to B and
>> >>>>>>>>>>>>>>>> things
>> >>>>>>>>>>>> fall
>> >>>>>>>>>>>>>>>> apart (interface changes, utilities that were useful
>> >> but not
>> >>>>>>>>>>>> public,
>> >>>>>>>>>>>>>>>> transitive dependencies etc.).
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> The discussion here also highlights the benefits of
>> >> keeping
>> >>>>>>>>>>>> certain
>> >>>>>>>>>>>>>>>> connectors outside Flink. Whether that is due to
>> >> difference
>> >>>>>>>>>>>>>>>> in developer community, maturity of the connectors,
>> >> their
>> >>>>>>>>>>>>>>>> specialized/limited usage etc. I would like to see that
>> >> as a
>> >>>>>>>>>>>>>>>> sign
>> >>>>>>>>>>>> of
>> >>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>> growing ecosystem and most of the ideas that Arvid has
>> >> put
>> >>>>>>>>>>>>>>>> forward would benefit further growth of the connector
>> >>>>>>>>> ecosystem.
>> >>>>>>>>>>>>>>>> As for keeping connectors within Apache Flink: I prefer
>> >> that
>> >>>>>>>>>>>>>>>> as
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>> path forward for "essential" connectors like FileSource,
>> >>>>>>>>>>>> KafkaSource,
>> >>>>>>>>>>>>>>>> ... And we can still achieve a more flexible and faster
>> >>>>>>>>>>>>>>>> release
>> >>>>>>>>>>>>> cycle.
>> >>>>>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>>>>> Thomas
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Wed, Oct 20, 2021 at 3:32 AM Jark Wu <
>> >> imj...@gmail.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>> Hi Konstantin,
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> the connectors need to be adopted and require at least
>> >>>>>>>>>>>>>>>>>> one
>> >>>>>>>>>>>>> release
>> >>>>>>>>>>>>>>> per
>> >>>>>>>>>>>>>>>>> Flink minor release.
>> >>>>>>>>>>>>>>>>> However, this will make the releases of connectors
>> >> slower,
>> >>>>>>>>>> e.g.
>> >>>>>>>>>>>>>>> maintain
>> >>>>>>>>>>>>>>>>> features for multiple branches and release multiple
>> >>>>>>>>> branches.
>> >>>>>>>>>>>>>>>>> I think the main purpose of having an external
>> >> connector
>> >>>>>>>>>>>> repository
>> >>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>> order to have "faster releases of connectors"?
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>   From the perspective of CDC connector maintainers, the
>> >>>>>>>>>>>>>>>>> biggest
>> >>>>>>>>>>>>>>> advantage
>> >>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>> maintaining it outside of the Flink project is that:
>> >>>>>>>>>>>>>>>>> 1) we can have a more flexible and faster release cycle
>> >>>>>>>>>>>>>>>>> 2) we can be more liberal with committership for
>> >> connector
>> >>>>>>>>>>>>>> maintainers
>> >>>>>>>>>>>>>>>>> which can also attract more committers to help the
>> >> release.
>> >>>>>>>>>>>>>>>>> Personally, I think maintaining one connector
>> >> repository
>> >>>>>>>>>>>>>>>>> under
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>> ASF
>> >>>>>>>>>>>>>>>> may
>> >>>>>>>>>>>>>>>>> not have the above benefits.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>> Jark
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf <
>> >>>>>>>>>>>> kna...@apache.org>
>> >>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>> Hi everyone,
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> regarding the stability of the APIs. I think everyone
>> >>>>>>>>>>>>>>>>>> agrees
>> >>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>> connector APIs which are stable across minor versions
>> >>>>>>>>>>>>> (1.13->1.14)
>> >>>>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>> mid-term goal. But:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> a) These APIs are still quite young, and we shouldn't
>> >>>>>>>>>>>>>>>>>> make
>> >>>>>>>>>>>> them
>> >>>>>>>>>>>>>>> @Public
>> >>>>>>>>>>>>>>>>>> prematurely either.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> b) Isn't this *mostly* orthogonal to where the
>> >> connector
>> >>>>>>>>>>>>>>>>>> code
>> >>>>>>>>>>>>>> lives?
>> >>>>>>>>>>>>>>>> Yes,
>> >>>>>>>>>>>>>>>>>> as long as there are breaking changes, the connectors
>> >>>>>>>>>>>>>>>>>> need to
>> >>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>> adopted
>> >>>>>>>>>>>>>>>>>> and require at least one release per Flink minor
>> >> release.
>> >>>>>>>>>>>>>>>>>> Documentation-wise this can be addressed via a
>> >>>>>>>>>>>>>>>>>> compatibility
>> >>>>>>>>>>>>> matrix
>> >>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>> each connector as Arvid suggested. IMO we shouldn't
>> >> block
>> >>>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>> effort
>> >>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>> the stability of the APIs.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Konstantin
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> On Wed, Oct 20, 2021 at 8:56 AM Jark Wu
>> >>>>>>>>>>>>>>>>>> <imj...@gmail.com>
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>> Hi,
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> I think Thomas raised very good questions and would
>> >> like
>> >>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>> know
>> >>>>>>>>>>>>>>> your
>> >>>>>>>>>>>>>>>>>>> opinions if we want to move connectors out of flink
>> >> in
>> >>>>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>> version.
>> >>>>>>>>>>>>>>>>>>> (1) is the connector API already stable?
>> >>>>>>>>>>>>>>>>>>>> Separate releases would only make sense if the core
>> >>>>>>>>>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>> surface
>> >>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and
>> >>>>>>>>>>>>>>>>>>>> also
>> >>>>>>>>>>>> Beam),
>> >>>>>>>>>>>>>>>> that's
>> >>>>>>>>>>>>>>>>>>>> not the case currently. We should probably focus on
>> >>>>>>>>>>>> addressing
>> >>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>> stability first, before splitting code. A success
>> >>>>>>>>>>>>>>>>>>>> criteria
>> >>>>>>>>>>>>> could
>> >>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>> that we are able to build Iceberg and Beam against
>> >>>>>>>>>>>>>>>>>>>> multiple
>> >>>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>>>>>> versions w/o the need to change code. The goal would
>> >>>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>> that
>> >>>>>>>>>>>>> no
>> >>>>>>>>>>>>>>>>>>>> connector breaks when we make changes to Flink core.
>> >>>>>>>>>>>>>>>>>>>> Until
>> >>>>>>>>>>>>>> that's
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>> case, code separation creates a setup where 1+1 or
>> >> N+1
>> >>>>>>>>>>>>>>> repositories
>> >>>>>>>>>>>>>>>>>>>> need to move lock step.
>> >>>>>>>>>>>>>>>>>>>   From another discussion thread [1], connector API
>> >> is far
>> >>>>>>>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>> stable.
>> >>>>>>>>>>>>>>>>>>> Currently, it's hard to build connectors against
>> >>>>>>>>>>>>>>>>>>> multiple
>> >>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>> versions.
>> >>>>>>>>>>>>>>>>>>> There are breaking API changes both in 1.12 -> 1.13
>> >> and
>> >>>>>>>>>>>>>>>>>>> 1.13
>> >>>>>>>>>>>> ->
>> >>>>>>>>>>>>>> 1.14
>> >>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>    maybe also in the future versions,  because Table
>> >>>>>>>>>>>>>>>>>>> related
>> >>>>>>>>>>>> APIs
>> >>>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>> still
>> >>>>>>>>>>>>>>>>>>> @PublicEvolving and new Sink API is still
>> >> @Experimental.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> (2) Flink testability without connectors.
>> >>>>>>>>>>>>>>>>>>>> Flink w/o Kafka connector (and few others) isn't
>> >>>>>>>>>>>>>>>>>>>> viable. Testability of Flink was already brought up,
>> >>>>>>>>>>>>>>>>>>>> can we
>> >>>>>>>>>>>>>> really
>> >>>>>>>>>>>>>>>>>>>> certify a Flink core release without Kafka
>> >> connector?
>> >>>>>>>>>>>>>>>>>>>> Maybe
>> >>>>>>>>>>>>>> those
>> >>>>>>>>>>>>>>>>>>>> connectors that are used in Flink e2e tests to
>> >>>>>>>>>>>>>>>>>>>> validate
>> >>>>>>>>>>>>>>>> functionality
>> >>>>>>>>>>>>>>>>>>>> of core Flink should not be broken out?
>> >>>>>>>>>>>>>>>>>>> This is a very good question. How can we guarantee
>> >> the
>> >>>>>>>>>>>>>>>>>>> new
>> >>>>>>>>>>>>> Source
>> >>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>> Sink
>> >>>>>>>>>>>>>>>>>>> API are stable with only test implementation?
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>> Jark
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler <
>> >>>>>>>>>>>>>> ches...@apache.org>
>> >>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Could you clarify what release cadence you're
>> >> thinking
>> >>>>>>>>>> of?
>> >>>>>>>>>>>>>> There's
>> >>>>>>>>>>>>>>>> quite
>> >>>>>>>>>>>>>>>>>>>> a big range that fits "more frequent than Flink"
>> >>>>>>>>>>>> (per-commit,
>> >>>>>>>>>>>>>>> daily,
>> >>>>>>>>>>>>>>>>>>>> weekly, bi-weekly, monthly, even bi-monthly).
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> On 19/10/2021 14:15, Martijn Visser wrote:
>> >>>>>>>>>>>>>>>>>>>>> Hi all,
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> I think it would be a huge benefit if we can
>> >> achieve
>> >>>>>>>>>>>>>>>>>>>>> more
>> >>>>>>>>>>>>>>> frequent
>> >>>>>>>>>>>>>>>>>>>> releases
>> >>>>>>>>>>>>>>>>>>>>> of connectors, which are not bound to the release
>> >>>>>>>>>>>>>>>>>>>>> cycle
>> >>>>>>>>>>>> of
>> >>>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>>>>> itself.
>> >>>>>>>>>>>>>>>>>>>> I
>> >>>>>>>>>>>>>>>>>>>>> agree that in order to get there, we need to have
>> >>>>>>>>>>>>>>>>>>>>> stable
>> >>>>>>>>>>>>>>>> interfaces
>> >>>>>>>>>>>>>>>>>>> which
>> >>>>>>>>>>>>>>>>>>>>> are trustworthy and reliable, so they can be safely
>> >>>>>>>>>>>>>>>>>>>>> used
>> >>>>>>>>>>>> by
>> >>>>>>>>>>>>>>> those
>> >>>>>>>>>>>>>>>>>>>>> connectors. I do think that work still needs to be
>> >>>>>>>>>>>>>>>>>>>>> done
>> >>>>>>>>>>>> on
>> >>>>>>>>>>>>>> those
>> >>>>>>>>>>>>>>>>>>>>> interfaces, but I am confident that we can get
>> >> there
>> >>>>>>>>>>>> from a
>> >>>>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>>>>>>> perspective.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> I am worried that we would not be able to achieve
>> >>>>>>>>>>>>>>>>>>>>> those
>> >>>>>>>>>>>>>> frequent
>> >>>>>>>>>>>>>>>>>>> releases
>> >>>>>>>>>>>>>>>>>>>>> of connectors if we are putting these connectors
>> >>>>>>>>>>>>>>>>>>>>> under
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>>> Apache
>> >>>>>>>>>>>>>>>>>>>> umbrella,
>> >>>>>>>>>>>>>>>>>>>>> because that means that for each connector release
>> >>>>>>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>> have
>> >>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>> follow
>> >>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>> Apache release creation process. This requires a
>> >> lot
>> >>>>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>> manual
>> >>>>>>>>>>>>>>>> steps
>> >>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>> prohibits automation and I think it would be hard
>> >> to
>> >>>>>>>>>>>> scale
>> >>>>>>>>>>>>> out
>> >>>>>>>>>>>>>>>>>>> frequent
>> >>>>>>>>>>>>>>>>>>>>> releases of connectors. I'm curious how others
>> >> think
>> >>>>>>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>> challenge
>> >>>>>>>>>>>>>>>>>>> could
>> >>>>>>>>>>>>>>>>>>>>> be solved.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Best regards,
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Martijn
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> On Mon, 18 Oct 2021 at 22:22, Thomas Weise <
>> >>>>>>>>>>>> t...@apache.org>
>> >>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>> Thanks for initiating this discussion.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> There are definitely a few things that are not
>> >>>>>>>>>>>>>>>>>>>>>> optimal
>> >>>>>>>>>>>> with
>> >>>>>>>>>>>>>> our
>> >>>>>>>>>>>>>>>>>>>>>> current management of connectors. I would not
>> >>>>>>>>>>>> necessarily
>> >>>>>>>>>>>>>>>>>>> characterize
>> >>>>>>>>>>>>>>>>>>>>>> it as a "mess" though. As the points raised so far
>> >>>>>>>>>>>> show, it
>> >>>>>>>>>>>>>>> isn't
>> >>>>>>>>>>>>>>>>>>> easy
>> >>>>>>>>>>>>>>>>>>>>>> to find a solution that balances competing
>> >>>>>>>>>>>>>>>>>>>>>> requirements
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>> leads to
>> >>>>>>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>>>> net improvement.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> It would be great if we can find a setup that
>> >>>>>>>>>>>>>>>>>>>>>> allows for
>> >>>>>>>>>>>>>>>> connectors
>> >>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>> be released independently of core Flink and that
>> >>>>>>>>>>>>>>>>>>>>>> each
>> >>>>>>>>>>>>>> connector
>> >>>>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>> released separately. Flink already has separate
>> >>>>>>>>>>>>>>>>>>>>> releases (flink-shaded), so that by itself isn't a
>> >>>>>>>>>> new thing.
>> >>>>>>>>>>>>>>>> Per-connector
>> >>>>>>>>>>>>>>>>>>>>>> releases would need to allow for more frequent
>> >>>>>>>>>>>>>>>>>>>>>> releases
>> >>>>>>>>>>>>>>> (without
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>> baggage that a full Flink release comes with).
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Separate releases would only make sense if the core
>> >>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>> surface is
>> >>>>>>>>>>>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and
>> >>>>>>>>>>>>>>>>>>>>>> also
>> >>>>>>>>>>>>>> Beam),
>> >>>>>>>>>>>>>>>> that's
>> >>>>>>>>>>>>>>>>>>>>>> not the case currently. We should probably focus
>> >> on
>> >>>>>>>>>>>>>> addressing
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>> stability first, before splitting code. A success
>> >>>>>>>>>>>> criteria
>> >>>>>>>>>>>>>>> could
>> >>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>> that we are able to build Iceberg and Beam against
>> >>>>>>>>>>>> multiple
>> >>>>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>>>>>>>> versions w/o the need to change code. The goal
>> >>>>>>>>>>>>>>>>>>>>>> would be
>> >>>>>>>>>>>>> that
>> >>>>>>>>>>>>>> no
>> >>>>>>>>>>>>>>>>>>>>>> connector breaks when we make changes to Flink
>> >> core.
>> >>>>>>>>>>>> Until
>> >>>>>>>>>>>>>>>> that's the
>> >>>>>>>>>>>>>>>>>>>>>> case, code separation creates a setup where 1+1 or
>> >>>>>>>>>>>>>>>>>>>>>> N+1
>> >>>>>>>>>>>>>>>> repositories
>> >>>>>>>>>>>>>>>>>>>>>> need to move lock step.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Regarding some connectors being more important for
>> >>>>>>>>>>>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>> than
>> >>>>>>>>>>>>>>>> others:
>> >>>>>>>>>>>>>>>>>>>>> That's a fact. Flink w/o Kafka connector (and few
>> >>>>>>>>>>>> others)
>> >>>>>>>>>>>>>> isn't
>> >>>>>>>>>>>>>>>>>>>>>> viable. Testability of Flink was already brought
>> >>>>>>>>>>>>>>>>>>>>>> up,
>> >>>>>>>>>>>> can we
>> >>>>>>>>>>>>>>>> really
>> >>>>>>>>>>>>>>>>>>>>>> certify a Flink core release without Kafka
>> >>>>>>>>> connector?
>> >>>>>>>>>>>> Maybe
>> >>>>>>>>>>>>>>> those
>> >>>>>>>>>>>>>>>>>>>>>> connectors that are used in Flink e2e tests to
>> >>>>>>>>>>>>>>>>>>>>>> validate
>> >>>>>>>>>>>>>>>> functionality
>> >>>>>>>>>>>>>>>>>>>>>> of core Flink should not be broken out?
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Finally, I think that the connectors that move
>> >> into
>> >>>>>>>>>>>>> separate
>> >>>>>>>>>>>>>>>> repos
>> >>>>>>>>>>>>>>>>>>>>>> should remain part of the Apache Flink project.
>> >>>>>>>>>>>>>>>>>>>>>> Larger
>> >>>>>>>>>>>>>>>> organizations
>> >>>>>>>>>>>>>>>>>>>>>> tend to approve the use of and contribution to
>> >> open
>> >>>>>>>>>>>> source
>> >>>>>>>>>>>>> at
>> >>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>> project level. Sometimes it is everything ASF.
>> >> More
>> >>>>>>>>>>>> often
>> >>>>>>>>>>>>> it
>> >>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>> "Apache Foo". It would be fatal to end up with a
>> >>>>>>>>>>>> patchwork
>> >>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>> projects
>> >>>>>>>>>>>>>>>>>>>>>> with potentially different licenses and governance
>> >>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>> arrive
>> >>>>>>>>>>>>>>> at a
>> >>>>>>>>>>>>>>>>>>>>>> working Flink setup. This may mean we prioritize
>> >>>>>>>>>>>> usability
>> >>>>>>>>>>>>>> over
>> >>>>>>>>>>>>>>>>>>>>>> developer convenience, if that's in the best
>> >>>>>>>>>>>>>>>>>>>>>> interest of
>> >>>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>> as a
>> >>>>>>>>>>>>>>>>>>>>>> whole.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>>>>>>>>>>> Thomas
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler <
>> >>>>>>>>>>>>>>>> ches...@apache.org
>> >>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>> Generally, the issues are reproducibility and
>> >>>>>>>>>> control.
>> >>>>>>>>>>>>>>>>>>>>>>> Stuffs completely broken on the Flink side for a
>> >>>>>>>>>> week?
>> >>>>>>>>>>>>> Well
>> >>>>>>>>>>>>>>>> then so
>> >>>>>>>>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>>>>>>>>> the connector repos.
>> >>>>>>>>>>>>>>>>>>>>>>> (As-is) You can't go back to a previous version
>> >> of
>> >>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>> snapshot.
>> >>>>>>>>>>>>>>>>>>> Which
>> >>>>>>>>>>>>>>>>>>>>>>> also means that checking out older commits can be
>> >>>>>>>>>>>>>> problematic
>> >>>>>>>>>>>>>>>>>>> because
>> >>>>>>>>>>>>>>>>>>>>>>> you'd still work against the latest snapshots,
>> >> and
>> >>>>>>>>>>>>>>>>>>>>>>> they
>> >>>>>>>>>>>>> not
>> >>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>>>> compatible with each other.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> On 18/10/2021 15:22, Arvid Heise wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>> I was actually betting on snapshots versions.
>> >>>>>>>>>>>>>>>>>>>>>>>> What are
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>> limits?
>> >>>>>>>>>>>>>>>>>>>>>>>> Obviously, we can only do a release of a 1.15
>> >>>>>>>>>>>> connector
>> >>>>>>>>>>>>>> after
>> >>>>>>>>>>>>>>>> 1.15
>> >>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>> release.
>> >>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Konstantin Knauf
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >> https://urldefense.com/v3/__https://twitter.com/snntrable
>> >> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-
>> >>>>>>>>>>>>>>>>>> XjpYgX5MUy9M4$ [twitter[.]com]
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >> https://urldefense.com/v3/__https://github.com/knaufk__;!
>> >> !LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpY
>> >>>>>>>>>>>>>>>>>> gXyX8u50S$ [github[.]com]
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>
>>
>>

Re: [DISCUSS] Creating an external connector repository

Reply via email to