Re: [DISCUSS] Creating an external connector repository

Leonard Xu Mon, 18 Oct 2021 05:44:24 -0700

Hi, all

I understand very well that the maintainers of the community want to move the 
connector to an external system. Indeed, the development and maintenance of the 
connector requires a lot of energy, and these do not involve the Flink core 
framework, which can reduce the maintenance pressure on the community side.


I only have one concern. Once we migrate these connectors to external projects, 
how can we ensure them with high quality? All the built-in connectors of Flink 
are developed or reviewed by the committers. The reported connector bugs from 
JIRA and mailing lists will be quick fixed currently, how does the Flink 
community ensure the development rhythm of the connector after the move? In 
other words, are these connectors still first-class citizens of the Flink 
community? If it is how we guarantee.

Recently, I have maintained a series of cdc connectors in the Flink CDC project 
[1]. My feeling is that it is not easy to develop and maintain connectors. 
Contributors to the Flink CDC project have done some approaches in this way, 
such as building connector integration tests [2], document management [3]. 
Personally, I don’t have a strong tendency to move the built-in connectors out 
or keep them. If the final decision of this thread discussion  turns out to 
move out, I’m happy to share our experience and provide help in the new 
connector project. .

Best,
Leonard
[1]https://github.com/ververica/flink-cdc-connectors
[2]https://github.com/ververica/flink-cdc-connectors/runs/3902664601
[3]https://ververica.github.io/flink-cdc-connectors/master/

> 在 2021年10月18日，19:00，David Morávek <d...@apache.org> 写道：
> 
> We are mostly talking about the freedom this would bring to the connector 
> authors, but we still don't have answers for the important topics:
> 
> - How exactly are we going to maintain the high quality standard of the 
> connectors?
> - How would the connector release cycle to look like? Is this going to affect 
> the Flink release cycle?
> - How would the documentation process / generation look like?
> - Not all of the connectors rely solely on the Stable APIs. Moving them 
> outside of the Flink code-base will make any refactoring on the Flink side 
> significantly more complex as potentially needs to be reflected into all 
> connectors. There are some possible solutions, such as Gradle's included 
> builds, but we're far away from that. How are we planning to address this?
> - How would we develop connectors against unreleased Flink version? Java 
> snapshots have many limits when used for the cross-repository development.
> - With appropriate tooling, this whole thing is achievable even with the 
> single repository that we already have. It just matter of having a more 
> fine-grained build / release process. Have you tried to research this option?
> 
> I'd personally strongly suggest against moving the connectors out of the ASF 
> umbrella. The ASF brings legal guarantees, hard gained trust of the users and 
> high quality standards to the table. I still fail to see any good reason for 
> giving this up. Also this decision would be hard to reverse, because it would 
> most likely require a new donation to the ASF (would this require a consent 
> from all contributors as there is no clear ownership?).
> 
> Best,
> D.
> 
> 
> On Mon, Oct 18, 2021 at 12:12 PM Qingsheng Ren <renqs...@gmail.com 
> <mailto:renqs...@gmail.com>> wrote:
> Thanks for driving this discussion Arvid! I think this will be one giant leap 
> for Flink community. Externalizing connectors would give connector developers 
> more freedom in developing, releasing and maintaining, which can attract more 
> developers for contributing their connectors and expand the Flink ecosystems.
> 
> Considering the position for hosting connectors, I prefer to use an 
> individual organization outside Apache umbrella. If we keep all connectors 
> under Apache, I think there’s not quite difference comparing keeping them in 
> the Flink main repo. Connector developers still require permissions from 
> Flink committers to contribute, and release process should follow Apache 
> rules, which are against our initial motivations of externalizing connectors.
> 
> Using an individual Github organization will maximum the freedom provided to 
> developers. An ideal structure in my mind would be like 
> "github.com/flink-connectors/flink-connector-xxx 
> <http://github.com/flink-connectors/flink-connector-xxx>". The new 
> established flink-extended org might be another choice, but considering the 
> amount of connectors, I prefer to use an individual org for connectors to 
> avoid flushing other repos under flink-extended.
> 
> In the meantime, we need to provide a well-established standard / guideline 
> for contributing connectors, including CI, testing, docs (maybe we can’t 
> provide resources for running them, but we should give enough guide on how to 
> setup one) to keep the high quality of connectors. I’m happy to help building 
> these fundamental bricks. Also since Kafka connector is widely used among 
> Flink users, we can make Kafka connector a “model” of how to build and 
> contribute a well-qualified connector into Flink ecosystem, and we can still 
> use this trusted one for Flink E2E tests.
> 
> Again I believe this will definitely boost the expansion of Flink ecosystem. 
> Very excited to see the progress!
> 
> Best,
> 
> Qingsheng Ren
> On Oct 15, 2021, 8:47 PM +0800, Arvid Heise <ar...@apache.org 
> <mailto:ar...@apache.org>>, wrote:
> > Dear community,
> > Today I would like to kickstart a series of discussions around creating an 
> > external connector repository. The main idea is to decouple the release 
> > cycle of Flink with the release cycles of the connectors. This is a common 
> > approach in other big data analytics projects and seems to scale better 
> > than the current approach. In particular, it will yield the following 
> > changes.
> >  • Faster releases of connectors: New features can be added more quickly, 
> > bugs can be fixed immediately, and we can have faster security patches in 
> > case of direct or indirect (through dependencies) security flaws. • New 
> > features can be added to old Flink versions: If the connector API didn’t 
> > change, the same connector jar may be used with different Flink versions. 
> > Thus, new features can also immediately be used with older Flink versions. 
> > A compatibility matrix on each connector page will help users to find 
> > suitable connector versions for their Flink versions. • More activity and 
> > contributions around connectors: If we ease the contribution and 
> > development process around connectors, we will see faster development and 
> > also more connectors. Since that heavily depends on the chosen approach 
> > discussed below, more details will be shown there. • An overhaul of the 
> > connector page: In the future, all known connectors will be shown on the 
> > same page in a similar layout independent of where they reside. They could 
> > be hosted on external project pages (e.g., Iceberg and Hudi), on some 
> > company page, or may stay within the main Flink reposi    tory. Connectors 
> > may receive some sort of quality seal such that users can quickly access 
> > the production-readiness and we could also add which community/company 
> > promises which kind of support. • If we take out (some) connectors out of 
> > Flink, Flink CI will be faster and Flink devs will experience less build 
> > stabilities (which mostly come from connectors). That would also speed up 
> > Flink development.
> > Now I’d first like to collect your viewpoints on the ideal state. Let’s 
> > first recap which approaches, we currently have:
> >  • We have half of the connectors in the main Flink repository. Relatively 
> > few of them have received updates in the past couple of months. • Another 
> > large chunk of connectors are in Apache Bahir. It recently has seen the 
> > first release in 3 years. • There are a few other (Apache) projects that 
> > maintain a Flink connector, such as Apache Iceberg, Apache Hudi, and 
> > Pravega. • A few connectors are listed on company-related repositories, 
> > such as Apache Pulsar on StreamNative and CDC connectors on Ververica.
> > My personal observation is that having a repository per connector seems to 
> > increase the activity on a connector as it’s easier to maintain. For 
> > example, in Apache Bahir all connectors are built against the same Flink 
> > version, which may not be desirable when certain APIs change; for example, 
> > SinkFunction will be eventually deprecated and removed but new Sink 
> > interface may gain more features.
> > Now, I'd like to outline different approaches. All approaches will allow 
> > you to host your connector on any kind of personal, project, or company 
> > repository. We still want to provide a default place where users can 
> > contribute their connectors and hopefully grow a community around it. The 
> > approaches are:
> >  1. Create a mono-repo under the Apache umbrella where all connectors will 
> > reside, for example, github.com/apache/flink-connectors 
> > <http://github.com/apache/flink-connectors>. That repository needs to 
> > follow its rules: No GitHub issues, no Dependabot or similar tools, and a 
> > strict manual release process. It would be under the Flink community, such 
> > that Flink committers can write to that repository but no-one else. 2. 
> > Create a GitHub organization with small repositories, for example 
> > github.com/flink-connectors <http://github.com/flink-connectors>. Since 
> > it’s not under the Apache umbrella, we are free to use whatever process we 
> > deem best (up to a future discussion). Each repository can have a shared 
> > list of maintainers + connector specific committers. We can provide more 
> > automation. We may even allow different licenses to incorporate things like 
> > a connector to Oracle that cannot be released under ASL. 3. ??? <- please 
> > provide your additional approaches
> > In both cases, we will provide opinionated module/repository templates 
> > based on a connector testing framework and guidelines. Depending on the 
> > approach, we may need to enforce certain things.
> > I’d like to first focus on what the community would ideally seek and 
> > minimize the discussions around legal issues, which we would discuss later. 
> > For now, I’d also like to postpone the discussion if we move all or only a 
> > subset of connectors from Flink to the new default place as it seems to be 
> > orthogonal to the fundamental discussion.
> > PS: If the external repository for connectors is successful, I’d also like 
> > to move out other things like formats, filesystems, and metric reporters in 
> > the far future. So I’m actually aiming for 
> > github.com/(apache/)flink-packages 
> > <http://github.com/(apache/)flink-packages>. But again this discussion is 
> > orthogonal to the basic one.
> > PPS: Depending on the chosen approach, there may be synergies with the 
> > recently approved flink-extended organization.

Re: [DISCUSS] Creating an external connector repository

Reply via email to