Hi everyone, If you have any more comments or questions, do let me know. Else I'll open up a vote thread next week.
Best regards, Martijn On Tue, 11 Jan 2022 at 20:13, Martijn Visser <mart...@ververica.com> wrote: > Good question: we want to use the same setup as we currently have for > Flink, so using the existing CI infrastructure. > > On Mon, 10 Jan 2022 at 11:19, Chesnay Schepler <ches...@apache.org> wrote: > >> What CI resources do you actually intend use? Asking since the ASF GHA >> resources are afaik quite overloaded. >> >> On 05/01/2022 11:48, Martijn Visser wrote: >> > Hi everyone, >> > >> > I wanted to summarise the email thread and see if there are any open >> items >> > that still need to be discussed, before we can finalise the discussion >> in >> > this email thread: >> > >> > 1. About having multi connectors in one repo or each connector in its >> own >> > repository >> > >> > As explained by @Arvid Heise <ar...@apache.org> we ultimately propose >> to >> > have a single repository per connector, which seems to be favoured in >> the >> > community. >> > >> > 2. About having the connector repositories under ASF or not. >> > >> > The consensus is that all connectors would remain under the ASF. >> > >> > I think we can categorise the questions or concerns that are brought >> > forward as the following one: >> > >> > 3. How would we set up the testing? >> > >> > We need to make sure that we provide a proper testing framework, which >> > means that we provide a public Source- and Sink testing framework. As >> > mentioned extensively in the thread, we need to make sure that the >> > necessary interfaces are properly annotated and at least >> @PublicEvolving. >> > This also includes the test infrastructure, like MiniCluster. For the >> > latter, we don't know exactly yet how to balance having publicly >> available >> > test infrastructure vs being able to iterate inside of Flink, but we can >> > all agree this has to be solved. >> > >> > For testing infrastructure, we would like to use Github Actions. In the >> > current state, it probably makes sense for a connector repo to follow >> the >> > branching strategy of Flink. That will ensure a match between the >> released >> > connector and Flink version. This should change when all the Flink >> > interfaces have stabilised so you can use a connector with multiple >> Flink >> > versions. That means that we should have a nightly build test for: >> > >> > - The `main` branch of the connector (which would be the unreleased >> > version) against the `master` branch of Flink (the unreleased version of >> > Flink). >> > - Any supported `release-X.YY` branch of the connector against the >> > `release-X.YY` branch of Flink. >> > >> > We should also have a smoke test E2E tests in Flink (one for DataStream, >> > one for Table, one for SQL, one for Python) which loads all the >> connectors >> > and does an arbitrary test (post data on source, load into Flink, sink >> > output and compare that output is as expected. >> > >> > 4. How would we integrate documentation? >> > >> > Documentation for a connector should probably end up in the connector >> > repository. The Flink website should contain one entrance to all >> connectors >> > (so not the current approach where we have connectors per DataStream >> API, >> > Table API etc). Each connector documentation should end up as one menu >> item >> > in connectors, containing all necessary information for all DataStream, >> > Table, SQL and Python implementations. >> > >> > 5. Which connectors should end up in the external connector repo? >> > >> > I'll open up a separate thread on this topic to have a parallel >> discussion >> > on that. We should reach consensus on both threads before we can move >> > forward on this topic as a whole. >> > >> > Best regards, >> > >> > Martijn >> > >> > On Fri, 10 Dec 2021 at 04:47, Thomas Weise <t...@apache.org> wrote: >> > >> >> +1 for repo per connector from my side also >> >> >> >> Thanks for trying out the different approaches. >> >> >> >> Where would the common/infra pieces live? In a separate repository >> >> with its own release? >> >> >> >> Thomas >> >> >> >> On Thu, Dec 9, 2021 at 12:42 PM Till Rohrmann <trohrm...@apache.org> >> >> wrote: >> >>> Sorry if I was a bit unclear. +1 for the single repo per connector >> >> approach. >> >>> Cheers, >> >>> Till >> >>> >> >>> On Thu, Dec 9, 2021 at 5:41 PM Till Rohrmann <trohrm...@apache.org> >> >> wrote: >> >>>> +1 for the single repo approach. >> >>>> >> >>>> Cheers, >> >>>> Till >> >>>> >> >>>> On Thu, Dec 9, 2021 at 3:54 PM Martijn Visser <mart...@ververica.com >> > >> >>>> wrote: >> >>>> >> >>>>> I also agree that it feels more natural to go with a repo for each >> >>>>> individual connector. Each repository can be made available at >> >>>>> flink-packages.org so users can find them, next to referring to >> them >> >> in >> >>>>> documentation. +1 from my side. >> >>>>> >> >>>>> On Thu, 9 Dec 2021 at 15:38, Arvid Heise <ar...@apache.org> wrote: >> >>>>> >> >>>>>> Hi all, >> >>>>>> >> >>>>>> We tried out Chesnay's proposal and went with Option 2. >> >> Unfortunately, >> >>>>> we >> >>>>>> experienced tough nuts to crack and feel like we hit a dead end: >> >>>>>> - The main pain point with the outlined Frankensteinian connector >> >> repo >> >>>>> is >> >>>>>> how to handle shared code / infra code. If we have it in some >> >> <common> >> >>>>>> branch, then we need to merge the common branch in the connector >> >> branch >> >>>>> on >> >>>>>> update. However, it's unclear to me how improvements in the common >> >>>>> branch >> >>>>>> that naturally appear while working on a specific connector go back >> >> into >> >>>>>> the common branch. You can't use a pull request from your branch or >> >> else >> >>>>>> your connector code would poison the connector-less common branch. >> >> So >> >>>>> you >> >>>>>> would probably manually copy the files over to a common branch and >> >>>>> create a >> >>>>>> PR branch for that. >> >>>>>> - A weird solution could be to have the common branch as a >> >> submodule in >> >>>>> the >> >>>>>> repo itself (if that's even possible). I'm sure that this setup >> >> would >> >>>>> blow >> >>>>>> up the minds of all newcomers. >> >>>>>> - Similarly, it's mandatory to have safeguards against code from >> >>>>> connector >> >>>>>> A poisoning connector B, common, or main. I had some similar setup >> >> in >> >>>>> the >> >>>>>> past and code from two "distinct" branch types constantly swept >> >> over. >> >>>>>> - We could also say that we simply release <common> independently >> >> and >> >>>>> just >> >>>>>> have a maven (SNAPSHOT) dependency on it. But that would create a >> >> weird >> >>>>>> flow if you need to change in common where you need to constantly >> >> switch >> >>>>>> branches back and forth. >> >>>>>> - In general, Frankensteinian's approach is very switch intensive. >> >> If >> >>>>> you >> >>>>>> maintain 3 connectors and need to fix 1 build stability each at the >> >> same >> >>>>>> time (quite common nowadays for some reason) and you have 2 review >> >>>>> rounds, >> >>>>>> you need to switch branches 9 times ignoring changes to common. >> >>>>>> >> >>>>>> Additionally, we still have the rather user/dev unfriendly main >> >> that is >> >>>>>> mostly empty. I'm also not sure we can generate an overview >> >> README.md to >> >>>>>> make it more friendly here because in theory every connector branch >> >>>>> should >> >>>>>> be based on main and we would get merge conflicts. >> >>>>>> >> >>>>>> I'd like to propose once again to go with individual repositories. >> >>>>>> - The only downside that we discussed so far is that we have more >> >>>>> initial >> >>>>>> setup to do. Since we organically grow the number of >> >>>>> connector/repositories >> >>>>>> that load is quite distributed. We can offer templates after >> >> finding a >> >>>>> good >> >>>>>> approach that can even be used by outside organizations. >> >>>>>> - Regarding secrets, I think it's actually an advantage that the >> >> Kafka >> >>>>>> connector has no access to the AWS secrets. If there are secrets to >> >> be >> >>>>>> shared across connectors, we can and should use Azure's Variable >> >> Groups >> >>>>> (I >> >>>>>> have used it in the past to share Nexus creds across repos). That >> >> would >> >>>>>> also make rotation easy. >> >>>>>> - Working on different connectors would be rather easy as all >> >> modern IDE >> >>>>>> support multiple repo setups in the same project. You still need to >> >> do >> >>>>>> multiple releases in case you update common code (either accessed >> >>>>> through >> >>>>>> Nexus or git submodule) and you want to release your connector. >> >>>>>> - There is no difference in respect to how many CI runs there in >> >> both >> >>>>>> approaches. >> >>>>>> - Individual repositories also have the advantage of allowing >> >> external >> >>>>>> incubation. Let's assume someone builds connector A and hosts it in >> >>>>> their >> >>>>>> organization (very common setup). If they want to contribute the >> >> code to >> >>>>>> Flink, we could simply transfer the repository into ASF after >> >> ensuring >> >>>>>> Flink coding standards. Then we retain git history and Github >> >> issues. >> >>>>>> Is there any point that I'm missing? >> >>>>>> >> >>>>>> On Fri, Nov 26, 2021 at 1:32 PM Chesnay Schepler < >> >> ches...@apache.org> >> >>>>>> wrote: >> >>>>>> >> >>>>>>> For sharing workflows we should be able to use composite actions. >> >> We'd >> >>>>>>> have the main definition files in the flink-connectors repo, that >> >> we >> >>>>>>> also need to tag/release, which other branches/repos can then >> >> import. >> >>>>>>> These are also versioned, so we don't have to worry about >> >> accidentally >> >>>>>>> breaking stuff. >> >>>>>>> These could also be used to enforce certain standards / interfaces >> >>>>> such >> >>>>>>> that we can automate more things (e.g., integration into the Flink >> >>>>>>> documentation). >> >>>>>>> >> >>>>>>> It is true that Option 2) and dedicated repositories share a lot >> >> of >> >>>>>>> properties. While I did say in an offline conversation that we in >> >> that >> >>>>>>> case might just as well use separate repositories, I'm not so sure >> >>>>>>> anymore. One repo would make administration a bit easier, for >> >> example >> >>>>>>> secrets wouldn't have to be applied to each repo (we wouldn't want >> >>>>>>> certain secrets to be set up organization-wide). >> >>>>>>> I overall also like that one repo would present a single access >> >> point; >> >>>>>>> you can't "miss" a connector repo, and I would hope that having >> >> it as >> >>>>>>> one repo would nurture more collaboration between the connectors, >> >>>>> which >> >>>>>>> after all need to solve similar problems. >> >>>>>>> >> >>>>>>> It is a fair point that the branching model would be quite weird, >> >> but >> >>>>> I >> >>>>>>> think that would subside pretty quickly. >> >>>>>>> >> >>>>>>> Personally I'd go with Option 2, and if that doesn't work out we >> >> can >> >>>>>>> still split the repo later on. (Which should then be a trivial >> >> matter >> >>>>> of >> >>>>>>> copying all <connector>/* branches and renaming them). >> >>>>>>> >> >>>>>>> On 26/11/2021 12:47, Till Rohrmann wrote: >> >>>>>>>> Hi Arvid, >> >>>>>>>> >> >>>>>>>> Thanks for updating this thread with the latest findings. The >> >>>>> described >> >>>>>>>> limitations for a single connector repo sound suboptimal to me. >> >>>>>>>> >> >>>>>>>> * Option 2. sounds as if we try to simulate multi connector >> >> repos >> >>>>>> inside >> >>>>>>> of >> >>>>>>>> a single repo. I also don't know how we would share code >> >> between the >> >>>>>>>> different branches (sharing infrastructure would probably be >> >> easier >> >>>>>>>> though). This seems to have the same limitations as dedicated >> >> repos >> >>>>>> with >> >>>>>>>> the downside of having a not very intuitive branching model. >> >>>>>>>> * Isn't option 1. kind of a degenerated version of option 2. >> >> where >> >>>>> we >> >>>>>>> have >> >>>>>>>> some unrelated code from other connectors in the individual >> >>>>> connector >> >>>>>>>> branches? >> >>>>>>>> * Option 3. has the downside that someone creating a release >> >> has to >> >>>>>>> release >> >>>>>>>> all connectors. This means that she either has to sync with the >> >>>>>> different >> >>>>>>>> connector maintainers or has to be able to release all >> >> connectors on >> >>>>>> her >> >>>>>>>> own. We are already seeing in the Flink community that releases >> >>>>> require >> >>>>>>>> quite good communication/coordination between the different >> >> people >> >>>>>>> working >> >>>>>>>> on different Flink components. Given our goals to make connector >> >>>>>> releases >> >>>>>>>> easier and more frequent, I think that coupling different >> >> connector >> >>>>>>>> releases might be counter-productive. >> >>>>>>>> >> >>>>>>>> To me it sounds not very practical to mainly use a mono >> >> repository >> >>>>> w/o >> >>>>>>>> having some more advanced build infrastructure that, for >> >> example, >> >>>>>> allows >> >>>>>>> to >> >>>>>>>> have different git roots in different connector directories. >> >> Maybe >> >>>>> the >> >>>>>>> mono >> >>>>>>>> repo can be a catch all repository for connectors that want to >> >> be >> >>>>>>> released >> >>>>>>>> in lock-step (Option 3.) with all other connectors the repo >> >>>>> contains. >> >>>>>> But >> >>>>>>>> for connectors that get changed frequently, having a dedicated >> >>>>>> repository >> >>>>>>>> that allows independent releases sounds preferable to me. >> >>>>>>>> >> >>>>>>>> What utilities and infrastructure code do you intend to share? >> >> Using >> >>>>>> git >> >>>>>>>> submodules can definitely be one option to share code. However, >> >> it >> >>>>>> might >> >>>>>>>> also be ok to depend on flink-connector-common artifacts which >> >> could >> >>>>>> make >> >>>>>>>> things easier. Where I am unsure is whether git submodules can >> >> be >> >>>>> used >> >>>>>> to >> >>>>>>>> share infrastructure code (e.g. the .github/workflows) because >> >> you >> >>>>> need >> >>>>>>>> these files in the repo to trigger the CI infrastructure. >> >>>>>>>> >> >>>>>>>> Cheers, >> >>>>>>>> Till >> >>>>>>>> >> >>>>>>>> On Thu, Nov 25, 2021 at 1:59 PM Arvid Heise <ar...@apache.org> >> >>>>> wrote: >> >>>>>>>>> Hi Brian, >> >>>>>>>>> >> >>>>>>>>> Thank you for sharing. I think your approach is very valid and >> >> is >> >>>>> in >> >>>>>>> line >> >>>>>>>>> with what I had in mind. >> >>>>>>>>> >> >>>>>>>>> Basically Pravega community aligns the connector releases with >> >> the >> >>>>>>> Pravega >> >>>>>>>>>> mainline release >> >>>>>>>>>> >> >>>>>>>>> This certainly would mean that there is little value in >> >> coupling >> >>>>>>> connector >> >>>>>>>>> versions. So it's making a good case for having separate >> >> connector >> >>>>>>> repos. >> >>>>>>>>> >> >>>>>>>>>> and maintains the connector with the latest 3 Flink >> >> versions(CI >> >>>>> will >> >>>>>>>>>> publish snapshots for all these 3 branches) >> >>>>>>>>>> >> >>>>>>>>> I'd like to give connector devs a simple way to express to >> >> which >> >>>>> Flink >> >>>>>>>>> versions the current branch is compatible. From there we can >> >>>>> generate >> >>>>>>> the >> >>>>>>>>> compatibility matrix automatically and optionally also create >> >>>>>> different >> >>>>>>>>> releases per supported Flink version. Not sure if the latter is >> >>>>> indeed >> >>>>>>>>> better than having just one artifact that happens to run with >> >>>>> multiple >> >>>>>>>>> Flink versions. I guess it depends on what dependencies we are >> >>>>>>> exposing. If >> >>>>>>>>> the connector uses flink-connector-base, then we probably need >> >>>>>> separate >> >>>>>>>>> artifacts with poms anyways. >> >>>>>>>>> >> >>>>>>>>> Best, >> >>>>>>>>> >> >>>>>>>>> Arvid >> >>>>>>>>> >> >>>>>>>>> On Fri, Nov 19, 2021 at 10:55 AM Zhou, Brian <b.z...@dell.com> >> >>>>> wrote: >> >>>>>>>>>> Hi Arvid, >> >>>>>>>>>> >> >>>>>>>>>> For branching model, the Pravega Flink connector has some >> >>>>> experience >> >>>>>>> what >> >>>>>>>>>> I would like to share. Here[1][2] is the compatibility matrix >> >> and >> >>>>>> wiki >> >>>>>>>>>> explaining the branching model and releases. Basically Pravega >> >>>>>>> community >> >>>>>>>>>> aligns the connector releases with the Pravega mainline >> >> release, >> >>>>> and >> >>>>>>>>>> maintains the connector with the latest 3 Flink versions(CI >> >> will >> >>>>>>> publish >> >>>>>>>>>> snapshots for all these 3 branches). >> >>>>>>>>>> For example, recently we have 0.10.1 release[3], and in maven >> >>>>> central >> >>>>>>> we >> >>>>>>>>>> need to upload three artifacts(For Flink 1.13, 1.12, 1.11) for >> >>>>> 0.10.1 >> >>>>>>>>>> version[4]. >> >>>>>>>>>> >> >>>>>>>>>> There are some alternatives. Another solution that we once >> >>>>> discussed >> >>>>>>> but >> >>>>>>>>>> finally got abandoned is to have a independent version just >> >> like >> >>>>> the >> >>>>>>>>>> current CDC connector, and then give a big compatibility >> >> matrix to >> >>>>>>> users. >> >>>>>>>>>> We think it would be too confusing when the connector >> >> develops. On >> >>>>>> the >> >>>>>>>>>> contrary, we can also do the opposite way to align with Flink >> >>>>> version >> >>>>>>> and >> >>>>>>>>>> maintain several branches for different system version. >> >>>>>>>>>> >> >>>>>>>>>> I would say this is only a fairly-OK solution because it is a >> >> bit >> >>>>>>> painful >> >>>>>>>>>> for maintainers as cherry-picks are very common and releases >> >> would >> >>>>>>>>> require >> >>>>>>>>>> much work. However, if neither systems do not have a nice >> >> backward >> >>>>>>>>>> compatibility, there seems to be no comfortable solution to >> >> the >> >>>>> their >> >>>>>>>>>> connector. >> >>>>>>>>>> >> >>>>>>>>>> [1] >> >>>>> https://github.com/pravega/flink-connectors#compatibility-matrix >> >>>>>>>>>> [2] >> >>>>>>>>>> >> >> >> https://github.com/pravega/flink-connectors/wiki/Versioning-strategy-for-Flink-connector >> >>>>>>>>>> [3] >> >>>>> https://github.com/pravega/flink-connectors/releases/tag/v0.10.1 >> >>>>>>>>>> [4] >> >> https://search.maven.org/search?q=pravega-connectors-flink >> >>>>>>>>>> Best Regards, >> >>>>>>>>>> Brian >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Internal Use - Confidential >> >>>>>>>>>> >> >>>>>>>>>> -----Original Message----- >> >>>>>>>>>> From: Arvid Heise <ar...@apache.org> >> >>>>>>>>>> Sent: Friday, November 19, 2021 4:12 PM >> >>>>>>>>>> To: dev >> >>>>>>>>>> Subject: Re: [DISCUSS] Creating an external connector >> >> repository >> >>>>>>>>>> >> >>>>>>>>>> [EXTERNAL EMAIL] >> >>>>>>>>>> >> >>>>>>>>>> Hi everyone, >> >>>>>>>>>> >> >>>>>>>>>> we are currently in the process of setting up the >> >> flink-connectors >> >>>>>> repo >> >>>>>>>>>> [1] for new connectors but we hit a wall that we currently >> >> cannot >> >>>>>> take: >> >>>>>>>>>> branching model. >> >>>>>>>>>> To reiterate the original motivation of the external connector >> >>>>> repo: >> >>>>>> We >> >>>>>>>>>> want to decouple the release cycle of a connector with Flink. >> >>>>>> However, >> >>>>>>> if >> >>>>>>>>>> we want to support semantic versioning in the connectors with >> >> the >> >>>>>>> ability >> >>>>>>>>>> to introduce breaking changes through major version bumps and >> >>>>> support >> >>>>>>>>>> bugfixes on old versions, then we need release branches >> >> similar to >> >>>>>> how >> >>>>>>>>>> Flink core operates. >> >>>>>>>>>> Consider two connectors, let's call them kafka and hbase. We >> >> have >> >>>>>> kafka >> >>>>>>>>> in >> >>>>>>>>>> version 1.0.X, 1.1.Y (small improvement), 2.0.Z (config >> >> option) >> >>>>>> change >> >>>>>>>>> and >> >>>>>>>>>> hbase only on 1.0.A. >> >>>>>>>>>> >> >>>>>>>>>> Now our current assumption was that we can work with a >> >> mono-repo >> >>>>>> under >> >>>>>>>>> ASF >> >>>>>>>>>> (flink-connectors). Then, for release-branches, we found 3 >> >>>>> options: >> >>>>>>>>>> 1. We would need to create some ugly mess with the cross >> >> product >> >>>>> of >> >>>>>>>>>> connector and version: so you have kafka-release-1.0, >> >>>>>>> kafka-release-1.1, >> >>>>>>>>>> kafka-release-2.0, hbase-release-1.0. The main issue is not >> >> the >> >>>>>> amount >> >>>>>>> of >> >>>>>>>>>> branches (that's something that git can handle) but there the >> >>>>> state >> >>>>>> of >> >>>>>>>>>> kafka is undefined in hbase-release-1.0. That's a call for >> >>>>> desaster >> >>>>>> and >> >>>>>>>>>> makes releasing connectors very cumbersome (CI would only >> >> execute >> >>>>> and >> >>>>>>>>>> publish hbase SNAPSHOTS on hbase-release-1.0). >> >>>>>>>>>> 2. We could avoid the undefined state by having an empty >> >> master >> >>>>> and >> >>>>>>> each >> >>>>>>>>>> release branch really only holds the code of the connector. >> >> But >> >>>>>> that's >> >>>>>>>>> also >> >>>>>>>>>> not great: any user that looks at the repo and sees no >> >> connector >> >>>>>> would >> >>>>>>>>>> assume that it's dead. >> >>>>>>>>>> 3. We could have synced releases similar to the CDC connectors >> >>>>> [2]. >> >>>>>>> That >> >>>>>>>>>> means that if any connector introduces a breaking change, all >> >>>>>>> connectors >> >>>>>>>>>> get a new major. I find that quite confusing to a user if >> >> hbase >> >>>>> gets >> >>>>>> a >> >>>>>>>>> new >> >>>>>>>>>> release without any change because kafka introduced a breaking >> >>>>>> change. >> >>>>>>>>>> To fully decouple release cycles and CI of connectors, we >> >> could >> >>>>> add >> >>>>>>>>>> individual repositories under ASF (flink-connector-kafka, >> >>>>>>>>>> flink-connector-hbase). Then we can apply the same branching >> >>>>> model as >> >>>>>>>>>> before. I quickly checked if there are precedences in the >> >> apache >> >>>>>>>>> community >> >>>>>>>>>> for that approach and just by scanning alphabetically I found >> >>>>> cordova >> >>>>>>>>> with >> >>>>>>>>>> 70 and couchdb with 77 apache repos respectively. So it >> >> certainly >> >>>>>> seems >> >>>>>>>>>> like other projects approached our problem in that way and the >> >>>>> apache >> >>>>>>>>>> organization is okay with that. I currently expect max 20 >> >>>>> additional >> >>>>>>>>> repos >> >>>>>>>>>> for connectors and in the future 10 max each for formats and >> >>>>>>> filesystems >> >>>>>>>>> if >> >>>>>>>>>> we would also move them out at some point in time. So we >> >> would be >> >>>>> at >> >>>>>> a >> >>>>>>>>>> total of 50 repos. >> >>>>>>>>>> >> >>>>>>>>>> Note for all options, we need to provide a compability matrix >> >>>>> that we >> >>>>>>> aim >> >>>>>>>>>> to autogenerate. >> >>>>>>>>>> >> >>>>>>>>>> Now for the potential downsides that we internally discussed: >> >>>>>>>>>> - How can we ensure common infra structure code, utilties, and >> >>>>>> quality? >> >>>>>>>>>> I propose to add a flink-connector-common that contains all >> >> these >> >>>>>>> things >> >>>>>>>>>> and is added as a git submodule/subtree to the repos. >> >>>>>>>>>> - Do we implicitly discourage connector developers to maintain >> >>>>> more >> >>>>>>> than >> >>>>>>>>>> one connector with a fragmented code base? >> >>>>>>>>>> That is certainly a risk. However, I currently also see few >> >> devs >> >>>>>>> working >> >>>>>>>>>> on more than one connector. However, it may actually help >> >> keeping >> >>>>> the >> >>>>>>>>> devs >> >>>>>>>>>> that maintain a specific connector on the hook. We could use >> >>>>> github >> >>>>>>>>> issues >> >>>>>>>>>> to track bugs and feature requests and a dev can focus his >> >> limited >> >>>>>> time >> >>>>>>>>> on >> >>>>>>>>>> getting that one connector right. >> >>>>>>>>>> >> >>>>>>>>>> So WDYT? Compared to some intermediate suggestions with split >> >>>>> repos, >> >>>>>>> the >> >>>>>>>>>> big difference is that everything remains under Apache >> >> umbrella >> >>>>> and >> >>>>>> the >> >>>>>>>>>> Flink community. >> >>>>>>>>>> >> >>>>>>>>>> [1] >> >>>>>>>>>> >> >> >> https://urldefense.com/v3/__https://github.com/apache/flink-connectors__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4$ >> >>>>>>>>>> [github[.]com] [2] >> >>>>>>>>>> >> >> >> https://urldefense.com/v3/__https://github.com/ververica/flink-cdc-connectors/__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzgoPGA8$ >> >>>>>>>>>> [github[.]com] >> >>>>>>>>>> >> >>>>>>>>>> On Fri, Nov 12, 2021 at 3:39 PM Arvid Heise <ar...@apache.org >> >>>>>> wrote: >> >>>>>>>>>>> Hi everyone, >> >>>>>>>>>>> >> >>>>>>>>>>> I created the flink-connectors repo [1] to advance the >> >> topic. We >> >>>>>> would >> >>>>>>>>>>> create a proof-of-concept in the next few weeks as a special >> >>>>> branch >> >>>>>>>>>>> that I'd then use for discussions. If the community agrees >> >> with >> >>>>> the >> >>>>>>>>>>> approach, that special branch will become the master. If >> >> not, we >> >>>>> can >> >>>>>>>>>>> reiterate over it or create competing POCs. >> >>>>>>>>>>> >> >>>>>>>>>>> If someone wants to try things out in parallel, just make >> >> sure >> >>>>> that >> >>>>>>>>>>> you are not accidentally pushing POCs to the master. >> >>>>>>>>>>> >> >>>>>>>>>>> As a reminder: We will not move out any current connector >> >> from >> >>>>> Flink >> >>>>>>>>>>> at this point in time, so everything in Flink will remain as >> >> is >> >>>>> and >> >>>>>> be >> >>>>>>>>>>> maintained there. >> >>>>>>>>>>> >> >>>>>>>>>>> Best, >> >>>>>>>>>>> >> >>>>>>>>>>> Arvid >> >>>>>>>>>>> >> >>>>>>>>>>> [1] >> >>>>>>>>>>> >> >> https://urldefense.com/v3/__https://github.com/apache/flink-connectors >> >> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpYgXzxxweh4 >> >>>>>>>>>>> $ [github[.]com] >> >>>>>>>>>>> >> >>>>>>>>>>> On Fri, Oct 29, 2021 at 6:57 PM Till Rohrmann < >> >>>>> trohrm...@apache.org >> >>>>>>>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> Hi everyone, >> >>>>>>>>>>>> >> >>>>>>>>>>>> From the discussion, it seems to me that we have different >> >>>>>> opinions >> >>>>>>>>>>>> whether to have an ASF umbrella repository or to host them >> >>>>> outside >> >>>>>> of >> >>>>>>>>>>>> the ASF. It also seems that this is not really the problem >> >> to >> >>>>>> solve. >> >>>>>>>>>>>> Since there are many good arguments for either approach, we >> >>>>> could >> >>>>>>>>>>>> simply start with an ASF umbrella repository and see how >> >> people >> >>>>>> adopt >> >>>>>>>>>>>> it. If the individual connectors cannot move fast enough or >> >> if >> >>>>>> people >> >>>>>>>>>>>> prefer to not buy into the more heavy-weight ASF processes, >> >> then >> >>>>>> they >> >>>>>>>>>>>> can host the code also somewhere else. We simply need to >> >> make >> >>>>> sure >> >>>>>>>>>>>> that these connectors are discoverable (e.g. via >> >>>>> flink-packages). >> >>>>>>>>>>>> The more important problem seems to be to provide common >> >> tooling >> >>>>>>>>>>>> (testing, infrastructure, documentation) that can easily be >> >>>>> reused. >> >>>>>>>>>>>> Similarly, it has become clear that the Flink community >> >> needs to >> >>>>>>>>>>>> improve on providing stable APIs. I think it is not >> >> realistic to >> >>>>>>>>>>>> first complete these tasks before starting to move >> >> connectors to >> >>>>>>>>>>>> dedicated repositories. As Stephan said, creating a >> >> connector >> >>>>>>>>>>>> repository will force us to pay more attention to API >> >> stability >> >>>>> and >> >>>>>>>>>>>> also to think about which testing tools are required. >> >> Hence, I >> >>>>>>>>>>>> believe that starting to add connectors to a different >> >>>>> repository >> >>>>>>>>>>>> than apache/flink will help improve our connector tooling >> >>>>>> (declaring >> >>>>>>>>>>>> testing classes as public, creating a common test utility >> >> repo, >> >>>>>>>>>>>> creating a repo >> >>>>>>>>>>>> template) and vice versa. Hence, I like Arvid's proposed >> >>>>> process as >> >>>>>>>>>>>> it will start kicking things off w/o letting this effort >> >> fizzle >> >>>>>> out. >> >>>>>>>>>>>> Cheers, >> >>>>>>>>>>>> Till >> >>>>>>>>>>>> >> >>>>>>>>>>>> On Thu, Oct 28, 2021 at 11:44 AM Stephan Ewen < >> >> se...@apache.org >> >>>>>>>>> wrote: >> >>>>>>>>>>>>> Thank you all, for the nice discussion! >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> From my point of view, I very much like the idea of >> >> putting >> >>>>>>>>>>>>> connectors >> >>>>>>>>>>>> in a >> >>>>>>>>>>>>> separate repository. But I would argue it should be part of >> >>>>> Apache >> >>>>>>>>>>>> Flink, >> >>>>>>>>>>>>> similar to flink-statefun, flink-ml, etc. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I share many of the reasons for that: >> >>>>>>>>>>>>> - As argued many times, reduces complexity of the Flink >> >>>>> repo, >> >>>>>>>>>>>> increases >> >>>>>>>>>>>>> response times of CI, etc. >> >>>>>>>>>>>>> - Much lower barrier of contribution, because an >> >> unstable >> >>>>>>>>>>>>> connector >> >>>>>>>>>>>> would >> >>>>>>>>>>>>> not de-stabilize the whole build. Of course, we would need >> >> to >> >>>>> make >> >>>>>>>>>>>>> sure >> >>>>>>>>>>>> we >> >>>>>>>>>>>>> set this up the right way, with connectors having >> >> individual CI >> >>>>>>>>>>>>> runs, >> >>>>>>>>>>>> build >> >>>>>>>>>>>>> status, etc. But it certainly seems possible. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I would argue some points a bit different than some cases >> >> made >> >>>>>>>>> before: >> >>>>>>>>>>>>> (a) I believe the separation would increase connector >> >>>>> stability. >> >>>>>>>>>>>> Because it >> >>>>>>>>>>>>> really forces us to work with the connectors against the >> >> APIs >> >>>>> like >> >>>>>>>>>>>>> any external developer. A mono repo is somehow the wrong >> >> thing >> >>>>> if >> >>>>>>>>>>>>> you in practice want to actually guarantee stable internal >> >>>>> APIs at >> >>>>>>>>>> some layer. >> >>>>>>>>>>>>> Because the mono repo makes it easy to just change >> >> something on >> >>>>>>>>>>>>> both >> >>>>>>>>>>>> sides >> >>>>>>>>>>>>> of the API (provider and consumer) seamlessly. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Major refactorings in Flink need to keep all connector API >> >>>>>>>>>>>>> contracts intact, or we need to have a new version of the >> >>>>>> connector >> >>>>>>>>>> API. >> >>>>>>>>>>>>> (b) We may even be able to go towards more lightweight and >> >>>>>>>>>>>>> automated releases over time, even if we stay in Apache >> >> Flink >> >>>>> with >> >>>>>>>>>> that repo. >> >>>>>>>>>>>>> This isn't yet fully aligned with the Apache release >> >> policies, >> >>>>>> yet, >> >>>>>>>>>>>>> but there are board discussions about whether there can be >> >>>>>>>>>>>>> bot-triggered releases (by dependabot) and how that could >> >> fit >> >>>>> into >> >>>>>>>>>> the Apache process. >> >>>>>>>>>>>>> This doesn't seem to be quite there just yet, but seeing >> >> that >> >>>>>> those >> >>>>>>>>>>>> start >> >>>>>>>>>>>>> is a good sign, and there is a good chance we can do some >> >>>>> things >> >>>>>>>>>> there. >> >>>>>>>>>>>>> I am not sure whether we should let bots trigger releases, >> >>>>> because >> >>>>>>>>>>>>> a >> >>>>>>>>>>>> final >> >>>>>>>>>>>>> human look at things isn't a bad thing, especially given >> >> the >> >>>>>>>>>>>>> popularity >> >>>>>>>>>>>> of >> >>>>>>>>>>>>> software supply chain attacks recently. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I do share Chesnay's concerns about complexity in tooling, >> >>>>> though. >> >>>>>>>>>>>>> Both release tooling and test tooling. They are not >> >>>>> incompatible >> >>>>>>>>>>>>> with that approach, but they are a task we need to tackle >> >>>>> during >> >>>>>>>>>>>>> this change which will add additional work. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Tue, Oct 26, 2021 at 10:31 AM Arvid Heise < >> >> ar...@apache.org >> >>>>>>>>>> wrote: >> >>>>>>>>>>>>>> Hi folks, >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> I think some questions came up and I'd like to address the >> >>>>>>>>>>>>>> question of >> >>>>>>>>>>>>> the >> >>>>>>>>>>>>>> timing. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Could you clarify what release cadence you're thinking of? >> >>>>>>>>>>>>>> There's >> >>>>>>>>>>>> quite >> >>>>>>>>>>>>>>> a big range that fits "more frequent than Flink" >> >> (per-commit, >> >>>>>>>>>>>>>>> daily, weekly, bi-weekly, monthly, even bi-monthly). >> >>>>>>>>>>>>>> The short answer is: as often as needed: >> >>>>>>>>>>>>>> - If there is a CVE in a dependency and we need to bump >> >> it - >> >>>>>>>>>>>>>> release immediately. >> >>>>>>>>>>>>>> - If there is a new feature merged, release soonish. We >> >> may >> >>>>>>>>>>>>>> collect a >> >>>>>>>>>>>> few >> >>>>>>>>>>>>>> successive features before a release. >> >>>>>>>>>>>>>> - If there is a bugfix, release immediately or soonish >> >>>>> depending >> >>>>>>>>>>>>>> on >> >>>>>>>>>>>> the >> >>>>>>>>>>>>>> severity and if there are workarounds available. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> We should not limit ourselves; the whole idea of >> >> independent >> >>>>>>>>>>>>>> releases >> >>>>>>>>>>>> is >> >>>>>>>>>>>>>> exactly that you release as needed. There is no release >> >>>>> planning >> >>>>>>>>>>>>>> or anything needed, you just go with a release as if it >> >> was an >> >>>>>>>>>>>>>> external artifact. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> (1) is the connector API already stable? >> >>>>>>>>>>>>>>> From another discussion thread [1], connector API is far >> >>>>> from >> >>>>>>>>>>>> stable. >> >>>>>>>>>>>>>>> Currently, it's hard to build connectors against multiple >> >>>>> Flink >> >>>>>>>>>>>>> versions. >> >>>>>>>>>>>>>>> There are breaking API changes both in 1.12 -> 1.13 and >> >> 1.13 >> >>>>> -> >> >>>>>>>>>>>>>>> 1.14 >> >>>>>>>>>>>>> and >> >>>>>>>>>>>>>>> maybe also in the future versions, because Table >> >> related >> >>>>> APIs >> >>>>>>>>>>>>>>> are >> >>>>>>>>>>>>> still >> >>>>>>>>>>>>>>> @PublicEvolving and new Sink API is still @Experimental. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> The question is: what is stable in an evolving system? We >> >>>>>>>>>>>>>> recently discovered that the old SourceFunction needed to >> >> be >> >>>>>>>>>>>>>> refined such that cancellation works correctly [1]. So >> >> that >> >>>>>>>>>>>>>> interface is in Flink since >> >>>>>>>>>>>> 7 >> >>>>>>>>>>>>>> years, heavily used also outside, and we still had to >> >> change >> >>>>> the >> >>>>>>>>>>>> contract >> >>>>>>>>>>>>>> in a way that I'd expect any implementer to recheck their >> >>>>>>>>>>>> implementation. >> >>>>>>>>>>>>>> It might not be necessary to change anything and you can >> >>>>> probably >> >>>>>>>>>>>> change >> >>>>>>>>>>>>>> the the code for all Flink versions but still, the >> >> interface >> >>>>> was >> >>>>>>>>>>>>>> not >> >>>>>>>>>>>>> stable >> >>>>>>>>>>>>>> in the closest sense. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> If we focus just on API changes on the unified interfaces, >> >>>>> then >> >>>>>>>>>>>>>> we >> >>>>>>>>>>>> expect >> >>>>>>>>>>>>>> one more change to Sink API to support compaction. For >> >> Table >> >>>>> API, >> >>>>>>>>>>>> there >> >>>>>>>>>>>>>> will most likely also be some changes in 1.15. So we could >> >>>>> wait >> >>>>>>>>>>>>>> for >> >>>>>>>>>>>> 1.15. >> >>>>>>>>>>>>>> But I'm questioning if that's really necessary because we >> >> will >> >>>>>>>>>>>>>> add >> >>>>>>>>>>>> more >> >>>>>>>>>>>>>> functionality beyond 1.15 without breaking API. For >> >> example, >> >>>>> we >> >>>>>>>>>>>>>> may >> >>>>>>>>>>>> add >> >>>>>>>>>>>>>> more unified connector metrics. If you want to use it in >> >> your >> >>>>>>>>>>>> connector, >> >>>>>>>>>>>>>> you have to support multiple Flink versions anyhow. So >> >> rather >> >>>>>>>>>>>>>> then >> >>>>>>>>>>>>> focusing >> >>>>>>>>>>>>>> the discussion on "when is stuff stable", I'd rather >> >> focus on >> >>>>>>>>>>>>>> "how >> >>>>>>>>>>>> can we >> >>>>>>>>>>>>>> support building connectors against multiple Flink >> >> versions" >> >>>>> and >> >>>>>>>>>>>>>> make >> >>>>>>>>>>>> it >> >>>>>>>>>>>>> as >> >>>>>>>>>>>>>> painless as possible. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Chesnay pointed out to use different branches for >> >> different >> >>>>> Flink >> >>>>>>>>>>>>> versions >> >>>>>>>>>>>>>> which sounds like a good suggestion. With a mono-repo, we >> >>>>> can't >> >>>>>>>>>>>>>> use branches differently anyways (there is no way to have >> >>>>> release >> >>>>>>>>>>>>>> branches >> >>>>>>>>>>>>> per >> >>>>>>>>>>>>>> connector without chaos). In these branches, we could >> >> provide >> >>>>>>>>>>>>>> shims to simulate future features in older Flink versions >> >> such >> >>>>>>>>>>>>>> that code-wise, >> >>>>>>>>>>>> the >> >>>>>>>>>>>>>> source code of a specific connector may not diverge >> >> (much). >> >>>>> For >> >>>>>>>>>>>> example, >> >>>>>>>>>>>>> to >> >>>>>>>>>>>>>> register unified connector metrics, we could simulate the >> >>>>> current >> >>>>>>>>>>>>> approach >> >>>>>>>>>>>>>> also in some utility package of the mono-repo. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> I see the stable core Flink API as a prerequisite for >> >>>>> modularity. >> >>>>>>>>>>>>>> And >> >>>>>>>>>>>>>>> for connectors it is not just the source and sink API >> >> (source >> >>>>>>>>>>>>>>> being stable as of 1.14), but everything that is >> >> required to >> >>>>>>>>>>>>>>> build and maintain a connector downstream, such as the >> >> test >> >>>>>>>>>>>>>>> utilities and infrastructure. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> That is a very fair point. I'm actually surprised to see >> >> that >> >>>>>>>>>>>>>> MiniClusterWithClientResource is not public. I see it >> >> being >> >>>>> used >> >>>>>>>>>>>>>> in >> >>>>>>>>>>>> all >> >>>>>>>>>>>>>> connectors, especially outside of Flink. I fear that as >> >> long >> >>>>> as >> >>>>>>>>>>>>>> we do >> >>>>>>>>>>>> not >> >>>>>>>>>>>>>> have connectors outside, we will not properly annotate and >> >>>>>>>>>>>>>> maintain >> >>>>>>>>>>>> these >> >>>>>>>>>>>>>> utilties in a classic hen-and-egg-problem. I will outline >> >> an >> >>>>> idea >> >>>>>>>>>>>>>> at >> >>>>>>>>>>>> the >> >>>>>>>>>>>>>> end. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> the connectors need to be adopted and require at least >> >> one >> >>>>>>>>>>>>>>> release >> >>>>>>>>>>>> per >> >>>>>>>>>>>>>>> Flink minor release. >> >>>>>>>>>>>>>>> However, this will make the releases of connectors >> >> slower, >> >>>>> e.g. >> >>>>>>>>>>>>> maintain >> >>>>>>>>>>>>>>> features for multiple branches and release multiple >> >> branches. >> >>>>>>>>>>>>>>> I think the main purpose of having an external connector >> >>>>>>>>>>>>>>> repository >> >>>>>>>>>>>> is >> >>>>>>>>>>>>> in >> >>>>>>>>>>>>>>> order to have "faster releases of connectors"? >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Imagine a project with a complex set of dependencies. >> >> Let's >> >>>>> say >> >>>>>>>>>>>> Flink >> >>>>>>>>>>>>>>> version A plus Flink reliant dependencies released by >> >> other >> >>>>>>>>>>>>>>> projects (Flink-external connectors, Beam, Iceberg, Hudi, >> >>>>> ..). >> >>>>>>>>>>>>>>> We don't want >> >>>>>>>>>>>> a >> >>>>>>>>>>>>>>> situation where we bump the core Flink version to B and >> >>>>> things >> >>>>>>>>>>>>>>> fall apart (interface changes, utilities that were >> >> useful but >> >>>>>>>>>>>>>>> not public, transitive dependencies etc.). >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Yes, that's why I wanted to automate the processes more >> >> which >> >>>>> is >> >>>>>>>>>>>>>> not >> >>>>>>>>>>>> that >> >>>>>>>>>>>>>> easy under ASF. Maybe we automate the source provision >> >> across >> >>>>>>>>>>>> supported >> >>>>>>>>>>>>>> versions and have 1 vote thread for all versions of a >> >>>>> connector? >> >>>>>>>>>>>>>> From the perspective of CDC connector maintainers, the >> >>>>> biggest >> >>>>>>>>>>>> advantage >> >>>>>>>>>>>>> of >> >>>>>>>>>>>>>>> maintaining it outside of the Flink project is that: >> >>>>>>>>>>>>>>> 1) we can have a more flexible and faster release cycle >> >>>>>>>>>>>>>>> 2) we can be more liberal with committership for >> >> connector >> >>>>>>>>>>>> maintainers >> >>>>>>>>>>>>>>> which can also attract more committers to help the >> >> release. >> >>>>>>>>>>>>>>> Personally, I think maintaining one connector repository >> >>>>> under >> >>>>>>>>>>>>>>> the >> >>>>>>>>>>>> ASF >> >>>>>>>>>>>>>> may >> >>>>>>>>>>>>>>> not have the above benefits. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Yes, I also feel that ASF is too restrictive for our >> >> needs. >> >>>>> But >> >>>>>>>>>>>>>> it >> >>>>>>>>>>>> feels >> >>>>>>>>>>>>>> like there are too many that see it differently and I >> >> think we >> >>>>>>>>>>>>>> need >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> (2) Flink testability without connectors. >> >>>>>>>>>>>>>>> This is a very good question. How can we guarantee the >> >> new >> >>>>>>>>>>>>>>> Source >> >>>>>>>>>>>> and >> >>>>>>>>>>>>>> Sink >> >>>>>>>>>>>>>>> API are stable with only test implementation? >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>> We can't and shouldn't. Since the connector repo is >> >> managed by >> >>>>>>>>>>>>>> Flink, >> >>>>>>>>>>>> a >> >>>>>>>>>>>>>> Flink release manager needs to check if the Flink >> >> connectors >> >>>>> are >> >>>>>>>>>>>> actually >> >>>>>>>>>>>>>> working prior to creating an RC. That's similar to how >> >>>>>>>>>>>>>> flink-shaded >> >>>>>>>>>>>> and >> >>>>>>>>>>>>>> flink core are related. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> So here is one idea that I had to get things rolling. We >> >> are >> >>>>>>>>>>>>>> going to address the external repo iteratively without >> >>>>>>>>>>>>>> compromising what we >> >>>>>>>>>>>>> already >> >>>>>>>>>>>>>> have: >> >>>>>>>>>>>>>> 1.Phase, add new contributions to external repo. We use >> >> that >> >>>>> time >> >>>>>>>>>>>>>> to >> >>>>>>>>>>>>> setup >> >>>>>>>>>>>>>> infra accordingly and optimize release processes. We will >> >>>>>>>>>>>>>> identify >> >>>>>>>>>>>> test >> >>>>>>>>>>>>>> utilities that are not yet public/stable and fix that. >> >>>>>>>>>>>>>> 2.Phase, add ports to the new unified interfaces of >> >> existing >> >>>>>>>>>>>> connectors. >> >>>>>>>>>>>>>> That requires a previous Flink release to make utilities >> >>>>> stable. >> >>>>>>>>>>>>>> Keep >> >>>>>>>>>>>> old >> >>>>>>>>>>>>>> interfaces in flink-core. >> >>>>>>>>>>>>>> 3.Phase, remove old interfaces in flink-core of some >> >>>>> connectors >> >>>>>>>>>>>>>> (tbd >> >>>>>>>>>>>> at a >> >>>>>>>>>>>>>> later point). >> >>>>>>>>>>>>>> 4.Phase, optionally move all remaining connectors (tbd at >> >> a >> >>>>> later >> >>>>>>>>>>>> point). >> >>>>>>>>>>>>>> I'd envision having ~3 months between the starting the >> >>>>> different >> >>>>>>>>>>>> phases. >> >>>>>>>>>>>>>> WDYT? >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> [1] >> >>>>>>>>>>>>>> >> >>>>>> https://urldefense.com/v3/__https://issues.apache.org/jira/browse >> >>>>> /FLINK-23527__;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgd >> >>>>>>>>>>>>>> ke_-XjpYgX2sIvAP4$ [issues[.]apache[.]org] >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> On Thu, Oct 21, 2021 at 7:12 AM Kyle Bendickson < >> >>>>> k...@tabular.io >> >>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>> Hi all, >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> My name is Kyle and I’m an open source developer >> >> primarily >> >>>>>>>>>>>>>>> focused >> >>>>>>>>>>>> on >> >>>>>>>>>>>>>>> Apache Iceberg. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> I’m happy to help clarify or elaborate on any aspect of >> >> our >> >>>>>>>>>>>> experience >> >>>>>>>>>>>>>>> working on a relatively decoupled connector that is >> >>>>> downstream >> >>>>>>>>>>>>>>> and >> >>>>>>>>>>>>> pretty >> >>>>>>>>>>>>>>> popular. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> I’d also love to be able to contribute or assist in any >> >> way I >> >>>>>>>>> can. >> >>>>>>>>>>>>>>> I don’t mean to thread jack, but are there any meetings >> >> or >> >>>>>>>>>>>>>>> community >> >>>>>>>>>>>>> sync >> >>>>>>>>>>>>>>> ups, specifically around the connector APIs, that I might >> >>>>> join >> >>>>>>>>>>>>>>> / be >> >>>>>>>>>>>>>> invited >> >>>>>>>>>>>>>>> to? >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> I did want to add that even though I’ve experienced some >> >> of >> >>>>> the >> >>>>>>>>>>>>>>> pain >> >>>>>>>>>>>>>> points >> >>>>>>>>>>>>>>> of integrating with an evolving system / API (catalog >> >> support >> >>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>> generally >> >>>>>>>>>>>>>>> speaking pretty new everywhere really in this space), I >> >> also >> >>>>>>>>>>>>>>> agree personally that you shouldn’t slow down development >> >>>>>>>>>>>>>>> velocity too >> >>>>>>>>>>>> much >> >>>>>>>>>>>>> for >> >>>>>>>>>>>>>>> the sake of external connector. Getting to a performant >> >> and >> >>>>>>>>>>>>>>> stable >> >>>>>>>>>>>>> place >> >>>>>>>>>>>>>>> should be the primary goal, and slowing that down to >> >> support >> >>>>>>>>>>>> stragglers >> >>>>>>>>>>>>>>> will (in my personal opinion) always be a losing game. >> >> Some >> >>>>>>>>>>>>>>> folks >> >>>>>>>>>>>> will >> >>>>>>>>>>>>>>> simply stay behind on versions regardless until they >> >> have to >> >>>>>>>>>>>> upgrade. >> >>>>>>>>>>>>>>> I am working on ensuring that the Iceberg community stays >> >>>>>>>>>>>>>>> within 1-2 versions of Flink, so that we can help provide >> >>>>> more >> >>>>>>>>>>>>>>> feedback or >> >>>>>>>>>>>>>> contribute >> >>>>>>>>>>>>>>> things that might make our ability to support multiple >> >> Flink >> >>>>>>>>>>>> runtimes / >> >>>>>>>>>>>>>>> versions with one project / codebase and minimal to no >> >>>>>>>>>>>>>>> reflection >> >>>>>>>>>>>> (our >> >>>>>>>>>>>>>>> desired goal). >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> If there’s anything I can do or any way I can be of >> >>>>> assistance, >> >>>>>>>>>>>> please >> >>>>>>>>>>>>>>> don’t hesitate to reach out. Or find me on ASF slack 😀 >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> I greatly appreciate your general concern for the needs >> >> of >> >>>>>>>>>>>> downstream >> >>>>>>>>>>>>>>> connector integrators! >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Cheers >> >>>>>>>>>>>>>>> Kyle Bendickson (GitHub: kbendick) Open Source Developer >> >> kyle >> >>>>>>>>>>>>>>> [at] tabular [dot] io >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> On Wed, Oct 20, 2021 at 11:35 AM Thomas Weise < >> >>>>> t...@apache.org> >> >>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>> Hi, >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> I see the stable core Flink API as a prerequisite for >> >>>>>>>>>> modularity. >> >>>>>>>>>>>> And >> >>>>>>>>>>>>>>>> for connectors it is not just the source and sink API >> >>>>> (source >> >>>>>>>>>>>> being >> >>>>>>>>>>>>>>>> stable as of 1.14), but everything that is required to >> >> build >> >>>>>>>>>>>>>>>> and maintain a connector downstream, such as the test >> >>>>>>>>>>>>>>>> utilities and infrastructure. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Without the stable surface of core Flink, changes will >> >> leak >> >>>>>>>>>>>>>>>> into downstream dependencies and force lock step >> >> updates. >> >>>>>>>>>>>>>>>> Refactoring across N repos is more painful than a single >> >>>>>>>>>>>>>>>> repo. Those with experience developing downstream of >> >> Flink >> >>>>>>>>>>>>>>>> will know the pain, and >> >>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>> isn't limited to connectors. I don't remember a Flink >> >> "minor >> >>>>>>>>>>>> version" >> >>>>>>>>>>>>>>>> update that was just a dependency version change and >> >> did not >> >>>>>>>>>>>>>>>> force other downstream changes. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Imagine a project with a complex set of dependencies. >> >> Let's >> >>>>>>>>>>>>>>>> say >> >>>>>>>>>>>> Flink >> >>>>>>>>>>>>>>>> version A plus Flink reliant dependencies released by >> >> other >> >>>>>>>>>>>> projects >> >>>>>>>>>>>>>>>> (Flink-external connectors, Beam, Iceberg, Hudi, ..). We >> >>>>>>>>>>>>>>>> don't >> >>>>>>>>>>>> want a >> >>>>>>>>>>>>>>>> situation where we bump the core Flink version to B and >> >>>>>>>>>>>>>>>> things >> >>>>>>>>>>>> fall >> >>>>>>>>>>>>>>>> apart (interface changes, utilities that were useful >> >> but not >> >>>>>>>>>>>> public, >> >>>>>>>>>>>>>>>> transitive dependencies etc.). >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> The discussion here also highlights the benefits of >> >> keeping >> >>>>>>>>>>>> certain >> >>>>>>>>>>>>>>>> connectors outside Flink. Whether that is due to >> >> difference >> >>>>>>>>>>>>>>>> in developer community, maturity of the connectors, >> >> their >> >>>>>>>>>>>>>>>> specialized/limited usage etc. I would like to see that >> >> as a >> >>>>>>>>>>>>>>>> sign >> >>>>>>>>>>>> of >> >>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>> growing ecosystem and most of the ideas that Arvid has >> >> put >> >>>>>>>>>>>>>>>> forward would benefit further growth of the connector >> >>>>>>>>> ecosystem. >> >>>>>>>>>>>>>>>> As for keeping connectors within Apache Flink: I prefer >> >> that >> >>>>>>>>>>>>>>>> as >> >>>>>>>>>>>> the >> >>>>>>>>>>>>>>>> path forward for "essential" connectors like FileSource, >> >>>>>>>>>>>> KafkaSource, >> >>>>>>>>>>>>>>>> ... And we can still achieve a more flexible and faster >> >>>>>>>>>>>>>>>> release >> >>>>>>>>>>>>> cycle. >> >>>>>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>>>>> Thomas >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> On Wed, Oct 20, 2021 at 3:32 AM Jark Wu < >> >> imj...@gmail.com> >> >>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>> Hi Konstantin, >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> the connectors need to be adopted and require at least >> >>>>>>>>>>>>>>>>>> one >> >>>>>>>>>>>>> release >> >>>>>>>>>>>>>>> per >> >>>>>>>>>>>>>>>>> Flink minor release. >> >>>>>>>>>>>>>>>>> However, this will make the releases of connectors >> >> slower, >> >>>>>>>>>> e.g. >> >>>>>>>>>>>>>>> maintain >> >>>>>>>>>>>>>>>>> features for multiple branches and release multiple >> >>>>>>>>> branches. >> >>>>>>>>>>>>>>>>> I think the main purpose of having an external >> >> connector >> >>>>>>>>>>>> repository >> >>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>> in >> >>>>>>>>>>>>>>>>> order to have "faster releases of connectors"? >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> From the perspective of CDC connector maintainers, the >> >>>>>>>>>>>>>>>>> biggest >> >>>>>>>>>>>>>>> advantage >> >>>>>>>>>>>>>>>> of >> >>>>>>>>>>>>>>>>> maintaining it outside of the Flink project is that: >> >>>>>>>>>>>>>>>>> 1) we can have a more flexible and faster release cycle >> >>>>>>>>>>>>>>>>> 2) we can be more liberal with committership for >> >> connector >> >>>>>>>>>>>>>> maintainers >> >>>>>>>>>>>>>>>>> which can also attract more committers to help the >> >> release. >> >>>>>>>>>>>>>>>>> Personally, I think maintaining one connector >> >> repository >> >>>>>>>>>>>>>>>>> under >> >>>>>>>>>>>> the >> >>>>>>>>>>>>>> ASF >> >>>>>>>>>>>>>>>> may >> >>>>>>>>>>>>>>>>> not have the above benefits. >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>> Jark >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> On Wed, 20 Oct 2021 at 15:14, Konstantin Knauf < >> >>>>>>>>>>>> kna...@apache.org> >> >>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>> Hi everyone, >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> regarding the stability of the APIs. I think everyone >> >>>>>>>>>>>>>>>>>> agrees >> >>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>>> connector APIs which are stable across minor versions >> >>>>>>>>>>>>> (1.13->1.14) >> >>>>>>>>>>>>>>> are >> >>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>> mid-term goal. But: >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> a) These APIs are still quite young, and we shouldn't >> >>>>>>>>>>>>>>>>>> make >> >>>>>>>>>>>> them >> >>>>>>>>>>>>>>> @Public >> >>>>>>>>>>>>>>>>>> prematurely either. >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> b) Isn't this *mostly* orthogonal to where the >> >> connector >> >>>>>>>>>>>>>>>>>> code >> >>>>>>>>>>>>>> lives? >> >>>>>>>>>>>>>>>> Yes, >> >>>>>>>>>>>>>>>>>> as long as there are breaking changes, the connectors >> >>>>>>>>>>>>>>>>>> need to >> >>>>>>>>>>>> be >> >>>>>>>>>>>>>>>> adopted >> >>>>>>>>>>>>>>>>>> and require at least one release per Flink minor >> >> release. >> >>>>>>>>>>>>>>>>>> Documentation-wise this can be addressed via a >> >>>>>>>>>>>>>>>>>> compatibility >> >>>>>>>>>>>>> matrix >> >>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>> each connector as Arvid suggested. IMO we shouldn't >> >> block >> >>>>>>>>>>>>>>>>>> this >> >>>>>>>>>>>>>> effort >> >>>>>>>>>>>>>>>> on >> >>>>>>>>>>>>>>>>>> the stability of the APIs. >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Cheers, >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Konstantin >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> On Wed, Oct 20, 2021 at 8:56 AM Jark Wu >> >>>>>>>>>>>>>>>>>> <imj...@gmail.com> >> >>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>> Hi, >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> I think Thomas raised very good questions and would >> >> like >> >>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>> know >> >>>>>>>>>>>>>>> your >> >>>>>>>>>>>>>>>>>>> opinions if we want to move connectors out of flink >> >> in >> >>>>>>>>>>>>>>>>>>> this >> >>>>>>>>>>>>>> version. >> >>>>>>>>>>>>>>>>>>> (1) is the connector API already stable? >> >>>>>>>>>>>>>>>>>>>> Separate releases would only make sense if the core >> >>>>>>>>>>>>>>>>>>>> Flink >> >>>>>>>>>>>>>> surface >> >>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and >> >>>>>>>>>>>>>>>>>>>> also >> >>>>>>>>>>>> Beam), >> >>>>>>>>>>>>>>>> that's >> >>>>>>>>>>>>>>>>>>>> not the case currently. We should probably focus on >> >>>>>>>>>>>> addressing >> >>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>> stability first, before splitting code. A success >> >>>>>>>>>>>>>>>>>>>> criteria >> >>>>>>>>>>>>> could >> >>>>>>>>>>>>>>> be >> >>>>>>>>>>>>>>>>>>>> that we are able to build Iceberg and Beam against >> >>>>>>>>>>>>>>>>>>>> multiple >> >>>>>>>>>>>>>> Flink >> >>>>>>>>>>>>>>>>>>>> versions w/o the need to change code. The goal would >> >>>>>>>>>>>>>>>>>>>> be >> >>>>>>>>>>>> that >> >>>>>>>>>>>>> no >> >>>>>>>>>>>>>>>>>>>> connector breaks when we make changes to Flink core. >> >>>>>>>>>>>>>>>>>>>> Until >> >>>>>>>>>>>>>> that's >> >>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>> case, code separation creates a setup where 1+1 or >> >> N+1 >> >>>>>>>>>>>>>>> repositories >> >>>>>>>>>>>>>>>>>>>> need to move lock step. >> >>>>>>>>>>>>>>>>>>> From another discussion thread [1], connector API >> >> is far >> >>>>>>>>>>>>>>>>>>> from >> >>>>>>>>>>>>>>> stable. >> >>>>>>>>>>>>>>>>>>> Currently, it's hard to build connectors against >> >>>>>>>>>>>>>>>>>>> multiple >> >>>>>>>>>>>> Flink >> >>>>>>>>>>>>>>>> versions. >> >>>>>>>>>>>>>>>>>>> There are breaking API changes both in 1.12 -> 1.13 >> >> and >> >>>>>>>>>>>>>>>>>>> 1.13 >> >>>>>>>>>>>> -> >> >>>>>>>>>>>>>> 1.14 >> >>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>> maybe also in the future versions, because Table >> >>>>>>>>>>>>>>>>>>> related >> >>>>>>>>>>>> APIs >> >>>>>>>>>>>>>> are >> >>>>>>>>>>>>>>>> still >> >>>>>>>>>>>>>>>>>>> @PublicEvolving and new Sink API is still >> >> @Experimental. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> (2) Flink testability without connectors. >> >>>>>>>>>>>>>>>>>>>> Flink w/o Kafka connector (and few others) isn't >> >>>>>>>>>>>>>>>>>>>> viable. Testability of Flink was already brought up, >> >>>>>>>>>>>>>>>>>>>> can we >> >>>>>>>>>>>>>> really >> >>>>>>>>>>>>>>>>>>>> certify a Flink core release without Kafka >> >> connector? >> >>>>>>>>>>>>>>>>>>>> Maybe >> >>>>>>>>>>>>>> those >> >>>>>>>>>>>>>>>>>>>> connectors that are used in Flink e2e tests to >> >>>>>>>>>>>>>>>>>>>> validate >> >>>>>>>>>>>>>>>> functionality >> >>>>>>>>>>>>>>>>>>>> of core Flink should not be broken out? >> >>>>>>>>>>>>>>>>>>> This is a very good question. How can we guarantee >> >> the >> >>>>>>>>>>>>>>>>>>> new >> >>>>>>>>>>>>> Source >> >>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>> Sink >> >>>>>>>>>>>>>>>>>>> API are stable with only test implementation? >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>> Jark >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> On Tue, 19 Oct 2021 at 23:56, Chesnay Schepler < >> >>>>>>>>>>>>>> ches...@apache.org> >> >>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> Could you clarify what release cadence you're >> >> thinking >> >>>>>>>>>> of? >> >>>>>>>>>>>>>> There's >> >>>>>>>>>>>>>>>> quite >> >>>>>>>>>>>>>>>>>>>> a big range that fits "more frequent than Flink" >> >>>>>>>>>>>> (per-commit, >> >>>>>>>>>>>>>>> daily, >> >>>>>>>>>>>>>>>>>>>> weekly, bi-weekly, monthly, even bi-monthly). >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> On 19/10/2021 14:15, Martijn Visser wrote: >> >>>>>>>>>>>>>>>>>>>>> Hi all, >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> I think it would be a huge benefit if we can >> >> achieve >> >>>>>>>>>>>>>>>>>>>>> more >> >>>>>>>>>>>>>>> frequent >> >>>>>>>>>>>>>>>>>>>> releases >> >>>>>>>>>>>>>>>>>>>>> of connectors, which are not bound to the release >> >>>>>>>>>>>>>>>>>>>>> cycle >> >>>>>>>>>>>> of >> >>>>>>>>>>>>>> Flink >> >>>>>>>>>>>>>>>>>>> itself. >> >>>>>>>>>>>>>>>>>>>> I >> >>>>>>>>>>>>>>>>>>>>> agree that in order to get there, we need to have >> >>>>>>>>>>>>>>>>>>>>> stable >> >>>>>>>>>>>>>>>> interfaces >> >>>>>>>>>>>>>>>>>>> which >> >>>>>>>>>>>>>>>>>>>>> are trustworthy and reliable, so they can be safely >> >>>>>>>>>>>>>>>>>>>>> used >> >>>>>>>>>>>> by >> >>>>>>>>>>>>>>> those >> >>>>>>>>>>>>>>>>>>>>> connectors. I do think that work still needs to be >> >>>>>>>>>>>>>>>>>>>>> done >> >>>>>>>>>>>> on >> >>>>>>>>>>>>>> those >> >>>>>>>>>>>>>>>>>>>>> interfaces, but I am confident that we can get >> >> there >> >>>>>>>>>>>> from a >> >>>>>>>>>>>>>>> Flink >> >>>>>>>>>>>>>>>>>>>>> perspective. >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> I am worried that we would not be able to achieve >> >>>>>>>>>>>>>>>>>>>>> those >> >>>>>>>>>>>>>> frequent >> >>>>>>>>>>>>>>>>>>> releases >> >>>>>>>>>>>>>>>>>>>>> of connectors if we are putting these connectors >> >>>>>>>>>>>>>>>>>>>>> under >> >>>>>>>>>>>> the >> >>>>>>>>>>>>>>> Apache >> >>>>>>>>>>>>>>>>>>>> umbrella, >> >>>>>>>>>>>>>>>>>>>>> because that means that for each connector release >> >>>>>>>>>>>>>>>>>>>>> we >> >>>>>>>>>>>> have >> >>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>> follow >> >>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>> Apache release creation process. This requires a >> >> lot >> >>>>>>>>>>>>>>>>>>>>> of >> >>>>>>>>>>>>> manual >> >>>>>>>>>>>>>>>> steps >> >>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>> prohibits automation and I think it would be hard >> >> to >> >>>>>>>>>>>> scale >> >>>>>>>>>>>>> out >> >>>>>>>>>>>>>>>>>>> frequent >> >>>>>>>>>>>>>>>>>>>>> releases of connectors. I'm curious how others >> >> think >> >>>>>>>>>>>>>>>>>>>>> this >> >>>>>>>>>>>>>>>> challenge >> >>>>>>>>>>>>>>>>>>> could >> >>>>>>>>>>>>>>>>>>>>> be solved. >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Best regards, >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Martijn >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> On Mon, 18 Oct 2021 at 22:22, Thomas Weise < >> >>>>>>>>>>>> t...@apache.org> >> >>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>> Thanks for initiating this discussion. >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> There are definitely a few things that are not >> >>>>>>>>>>>>>>>>>>>>>> optimal >> >>>>>>>>>>>> with >> >>>>>>>>>>>>>> our >> >>>>>>>>>>>>>>>>>>>>>> current management of connectors. I would not >> >>>>>>>>>>>> necessarily >> >>>>>>>>>>>>>>>>>>> characterize >> >>>>>>>>>>>>>>>>>>>>>> it as a "mess" though. As the points raised so far >> >>>>>>>>>>>> show, it >> >>>>>>>>>>>>>>> isn't >> >>>>>>>>>>>>>>>>>>> easy >> >>>>>>>>>>>>>>>>>>>>>> to find a solution that balances competing >> >>>>>>>>>>>>>>>>>>>>>> requirements >> >>>>>>>>>>>> and >> >>>>>>>>>>>>>>>> leads to >> >>>>>>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>>>>>>>> net improvement. >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> It would be great if we can find a setup that >> >>>>>>>>>>>>>>>>>>>>>> allows for >> >>>>>>>>>>>>>>>> connectors >> >>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>> be released independently of core Flink and that >> >>>>>>>>>>>>>>>>>>>>>> each >> >>>>>>>>>>>>>> connector >> >>>>>>>>>>>>>>>> can >> >>>>>>>>>>>>>>>>>>> be >> >>>>>>>>>>>>>>>>>>>>>> released separately. Flink already has separate >> >>>>>>>>>>>>>>>>>>>>> releases (flink-shaded), so that by itself isn't a >> >>>>>>>>>> new thing. >> >>>>>>>>>>>>>>>> Per-connector >> >>>>>>>>>>>>>>>>>>>>>> releases would need to allow for more frequent >> >>>>>>>>>>>>>>>>>>>>>> releases >> >>>>>>>>>>>>>>> (without >> >>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>> baggage that a full Flink release comes with). >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Separate releases would only make sense if the core >> >>>>>>>>>>>> Flink >> >>>>>>>>>>>>>>>> surface is >> >>>>>>>>>>>>>>>>>>>>>> fairly stable though. As evident from Iceberg (and >> >>>>>>>>>>>>>>>>>>>>>> also >> >>>>>>>>>>>>>> Beam), >> >>>>>>>>>>>>>>>> that's >> >>>>>>>>>>>>>>>>>>>>>> not the case currently. We should probably focus >> >> on >> >>>>>>>>>>>>>> addressing >> >>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>> stability first, before splitting code. A success >> >>>>>>>>>>>> criteria >> >>>>>>>>>>>>>>> could >> >>>>>>>>>>>>>>>> be >> >>>>>>>>>>>>>>>>>>>>>> that we are able to build Iceberg and Beam against >> >>>>>>>>>>>> multiple >> >>>>>>>>>>>>>>> Flink >> >>>>>>>>>>>>>>>>>>>>>> versions w/o the need to change code. The goal >> >>>>>>>>>>>>>>>>>>>>>> would be >> >>>>>>>>>>>>> that >> >>>>>>>>>>>>>> no >> >>>>>>>>>>>>>>>>>>>>>> connector breaks when we make changes to Flink >> >> core. >> >>>>>>>>>>>> Until >> >>>>>>>>>>>>>>>> that's the >> >>>>>>>>>>>>>>>>>>>>>> case, code separation creates a setup where 1+1 or >> >>>>>>>>>>>>>>>>>>>>>> N+1 >> >>>>>>>>>>>>>>>> repositories >> >>>>>>>>>>>>>>>>>>>>>> need to move lock step. >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Regarding some connectors being more important for >> >>>>>>>>>>>>>>>>>>>>>> Flink >> >>>>>>>>>>>>> than >> >>>>>>>>>>>>>>>> others: >> >>>>>>>>>>>>>>>>>>>>> That's a fact. Flink w/o Kafka connector (and few >> >>>>>>>>>>>> others) >> >>>>>>>>>>>>>> isn't >> >>>>>>>>>>>>>>>>>>>>>> viable. Testability of Flink was already brought >> >>>>>>>>>>>>>>>>>>>>>> up, >> >>>>>>>>>>>> can we >> >>>>>>>>>>>>>>>> really >> >>>>>>>>>>>>>>>>>>>>>> certify a Flink core release without Kafka >> >>>>>>>>> connector? >> >>>>>>>>>>>> Maybe >> >>>>>>>>>>>>>>> those >> >>>>>>>>>>>>>>>>>>>>>> connectors that are used in Flink e2e tests to >> >>>>>>>>>>>>>>>>>>>>>> validate >> >>>>>>>>>>>>>>>> functionality >> >>>>>>>>>>>>>>>>>>>>>> of core Flink should not be broken out? >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> Finally, I think that the connectors that move >> >> into >> >>>>>>>>>>>>> separate >> >>>>>>>>>>>>>>>> repos >> >>>>>>>>>>>>>>>>>>>>>> should remain part of the Apache Flink project. >> >>>>>>>>>>>>>>>>>>>>>> Larger >> >>>>>>>>>>>>>>>> organizations >> >>>>>>>>>>>>>>>>>>>>>> tend to approve the use of and contribution to >> >> open >> >>>>>>>>>>>> source >> >>>>>>>>>>>>> at >> >>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>> project level. Sometimes it is everything ASF. >> >> More >> >>>>>>>>>>>> often >> >>>>>>>>>>>>> it >> >>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>>>> "Apache Foo". It would be fatal to end up with a >> >>>>>>>>>>>> patchwork >> >>>>>>>>>>>>> of >> >>>>>>>>>>>>>>>>>>> projects >> >>>>>>>>>>>>>>>>>>>>>> with potentially different licenses and governance >> >>>>>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>> arrive >> >>>>>>>>>>>>>>> at a >> >>>>>>>>>>>>>>>>>>>>>> working Flink setup. This may mean we prioritize >> >>>>>>>>>>>> usability >> >>>>>>>>>>>>>> over >> >>>>>>>>>>>>>>>>>>>>>> developer convenience, if that's in the best >> >>>>>>>>>>>>>>>>>>>>>> interest of >> >>>>>>>>>>>>>> Flink >> >>>>>>>>>>>>>>>> as a >> >>>>>>>>>>>>>>>>>>>>>> whole. >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>>>>>>>>>>> Thomas >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler < >> >>>>>>>>>>>>>>>> ches...@apache.org >> >>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>> Generally, the issues are reproducibility and >> >>>>>>>>>> control. >> >>>>>>>>>>>>>>>>>>>>>>> Stuffs completely broken on the Flink side for a >> >>>>>>>>>> week? >> >>>>>>>>>>>>> Well >> >>>>>>>>>>>>>>>> then so >> >>>>>>>>>>>>>>>>>>> are >> >>>>>>>>>>>>>>>>>>>>>>> the connector repos. >> >>>>>>>>>>>>>>>>>>>>>>> (As-is) You can't go back to a previous version >> >> of >> >>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>> snapshot. >> >>>>>>>>>>>>>>>>>>> Which >> >>>>>>>>>>>>>>>>>>>>>>> also means that checking out older commits can be >> >>>>>>>>>>>>>> problematic >> >>>>>>>>>>>>>>>>>>> because >> >>>>>>>>>>>>>>>>>>>>>>> you'd still work against the latest snapshots, >> >> and >> >>>>>>>>>>>>>>>>>>>>>>> they >> >>>>>>>>>>>>> not >> >>>>>>>>>>>>>> be >> >>>>>>>>>>>>>>>>>>>>>>> compatible with each other. >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> On 18/10/2021 15:22, Arvid Heise wrote: >> >>>>>>>>>>>>>>>>>>>>>>>> I was actually betting on snapshots versions. >> >>>>>>>>>>>>>>>>>>>>>>>> What are >> >>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>> limits? >> >>>>>>>>>>>>>>>>>>>>>>>> Obviously, we can only do a release of a 1.15 >> >>>>>>>>>>>> connector >> >>>>>>>>>>>>>> after >> >>>>>>>>>>>>>>>> 1.15 >> >>>>>>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>>>>>> release. >> >>>>>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> Konstantin Knauf >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >> https://urldefense.com/v3/__https://twitter.com/snntrable >> >> __;!!LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_- >> >>>>>>>>>>>>>>>>>> XjpYgX5MUy9M4$ [twitter[.]com] >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >> https://urldefense.com/v3/__https://github.com/knaufk__;! >> >> !LpKI!2a1uSGfMmwc8HNwqBUIGtFPzLHP5m9yS0sC3n3IpLgdke_-XjpY >> >>>>>>>>>>>>>>>>>> gXyX8u50S$ [github[.]com] >> >>>>>>>>>>>>>>>>>> >> >>>>>>> >> >>