+1 to @Shirshanka Das<mailto:[email protected]>' proposal for pulling connectors into a separate repo. The Kafka connect model is worth emulating here.
That said, I prefer DIL connectors to be maintained as a standalone open source repository outside of Apache Gobblin for several reasons: 1. As has been already mentioned in the thread below, the connector library will evolve much more rapidly than the Gobblin core libraries. As such, it is better to have separate sets of committers that are more attuned to the pace of development in their respective libraries. This will ultimately lead to faster code reviews, bug fixes etc. 2. I imagine the community for connectors to be very different from the community for Gobblin core, and it is better to cultivate and support these communities independently. 3. Having DIL connectors outside Apache Gobblin, allows Gobblin to support a marketplace of connectors discoverable via a catalog. In this end state, we could have multiple implementations of the same connector with different feature sets catering to different use cases. Of course, any framework enhancements that are necessary to support DIL connectors can be contributed back to Gobblin core. HTH, Sudarshan ________________________________ From: Shirshanka Das <[email protected]> Sent: Monday, March 22, 2021 11:40 PM To: [email protected] <[email protected]> Subject: Re: [DISCUSS] Connectors in Apache Gobblin Hi Chris, Thanks for this proposal! I think we have had quite a few issues with our monolithic repository and I think it has hindered the development and maintenance of new connectors. JB makes some good points that are worth considering. My 2c: I think separating out the connectors into a separate repo, and in fact supporting multiple repos that can contain separate connectors is probably going to be my vote. This will help us also clarify the "public API" of the Gobblin framework versus internal details that many connectors probably depend on today. I would rather follow the Kafka Connect model of — core framework has API-s and is versioned independently from connector implementations which can live in other repositories. Implementations should feature in the "Connector Matrix" as part of the documentation for discoverability. There can be an official catalog of supported connectors, and maybe that can be our first "repo" that Abhishek is proposing. But I would make sure we are not creating a new monorepo pattern with it. What do others think? Shirshanka On Mon, Mar 22, 2021 at 10:09 PM, Jean-Baptiste Onofre <[email protected]> wrote: > Hi Chris, > > I agree that connector is very important. Other Apache projects became > popular mostly thank to the connectors set (I’m thinking about Apache Beam, > Apache Camel, or Apache Karaf Decanter for instance). The connectors allow > more users to "integrate" Gobblin in their ecosystem, so it would increase > our users community. It will also increase our dev community as it’s > probably easier to contribute on connector than in the Gobblin core. > > About the repo vs module, there are two questions IMHO: > 1. How to keep API/code sync together between Gobblin core and the > connectors > 2. Do we plan to have a different release cycle between core and > connectors (even if it’s always possible to release a module atomically) > > IMHO, if we plan to do a Gobblin release including core + connectors, then > a module is easier. > > Regards > JB > > Le 22 mars 2021 à 23:44, Chris Li <[email protected]> a écrit : > > Proposal: > > DIL (LinkedIn internal project name) is a generic multi-stage Gobblin > connector library. The code can be accessed here: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F&data=04%7C01%7Csuvasudevan%40linkedin.com%7Cc4f0705167374d1f536008d8edc687dd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637520784229647358%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6G4Zw7vWt6CW9UsWu1WY1fevzO%2B05k9WnfLsJWxRQEg%3D&reserved=0 > linkedin/gobblin-connectors. Its core features and high level > descriptions are shared here: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fengineering.linkedin.com%2Fblog%2F2021%2F&data=04%7C01%7Csuvasudevan%40linkedin.com%7Cc4f0705167374d1f536008d8edc687dd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637520784229647358%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zw%2BQuEclJXzWE1%2BEIHIocPJImnNW7zerPATmJz9Q9%2FQ%3D&reserved=0 > data-integration-library. > > Per initial discussion with members of Gobblin community, we are here > proposing a separate sub-repo for this library. > > Why: > Some thoughts/justifications of a sub-repo vs. a module in the main > Gobblin repo. > > 1. Gobblin connectors are important part of Gobblin ecosystem, but the > development of connectors is relatively independent of Gobblin core. > 2. Gobblin connector is where open source communities can contribute the > most, and it will be growing much faster than Gobblin core. > 3. The new connector library is a comprehensive package of unique design > patterns. This is where the data integration diversity challenge will be > addressed. The importance of this code base grows by day as more > integration scenarios are becoming supported. > 4. The new connector library evolves and replaces many prior Gobblin > connectors under the “gobblin-modules” module. A separate repo will help > avoid confusion. > 5. Separating core and ecosystem modules can help improve isolation and > reduce the number of defects. > > Regards, > Chris > >
