I think Robert's initial question needs to be focused on a particular split.
I agree that a "single project spanning multiple repos" does not make sense. But separate projects in separate repos is pretty widely used :-). The point of separate repos IMO would be to empower (and force) them to act as separate projects. Every monorepo I have worked in has struggled with modularity problems. But conversely, a project with poor modularity can thrive in a monorepo because it is feasible to make changes across all the bits that are tightly coupled. Because it is a subtext whenever a Google employee talks about monorepos, I want to call out that Google's uniquely massive and interesting monorepo requires a tremendous amount of bespoke infrastructure to manage coupling, testing, ownership, etc*. It is not analogous to a large repo on GitHub. So... which pieces are "not separate enough" and why and how do we want to make them separate? I can think of some candidates that could benefit from some kind of "separateness": - IOs or collections of IOs: separate release cadence, only build on stable SDK releases (potential for diamond dep problems) - Portability protos: forces them to be highly stable and forces runners to adapt to major iterations - Language SDKs: easier to build a community of devs with a clearly familiar project structure and toolchain Maybe the kinds of separation that folks want does not have to be a separate repo, as mentioned. But it is still important that most infrastructure and UI is geared towards a certain scale of project (not just repo): issue tracking, pull request management, mailing lists, ownership, selective test execution, triaging test failures, etc. At this point, I see strong arguments in both directions and think that a specific proposal of a specific split at the right time deserves an individualized discussion. Kenn *Other issues include governance and effectiveness for shipping user-friendly libraries On Wed, Oct 10, 2018 at 11:12 AM Ankur Goenka <[email protected]> wrote: > Hi, > > I think the subtext here is that development is hard in general. I agree > to it. And a major cause of it is diversity of languages, complexity of the > project and legacy code. > To alleviate language related issues, we are trying to have modular code > which we already have to a certain extent. > On the other hand tooling is still evolving and needs improvement. I also > feel that tooling is a moving target and its good to keep on reevaluating > it. > Tooling is a problem for everyone (the whole community) and we are > actively trying to solve it. Gradle is a big step towards it. > I personally contribute to multiple languages. Many of the PR have changes > spanning across languages and have to be merged as a whole. I personally > feel that having a unified build system makes it easier to do the checks > and make sure things work. > Even after gradle, I am still able to setup intellij for Java, Pycharm for > Python and GoLand for Go as I would have done earlier (before gradle). I am > also able to run "python setup.py sdist" as I was able to do before gradle. > Gradle is also acting as the top level task manager and most of the python > tasks are just plain shell commands stitched together. > The only real problem that I face in my setup is the vendored java jars > which only impact java development. > Probably documenting separate environment specific setup for each language > is sufficient to address the issue. > > I also agree with Max that splitting the repo will cause more pain than > gain. > > ~Ankur > > > > On Wed, Oct 10, 2018 at 7:56 AM Romain Manni-Bucau <[email protected]> > wrote: > >> >> >> >> Le mer. 10 oct. 2018 à 14:59, Maximilian Michels <[email protected]> a >> écrit : >> >>> Hi, >>> >>> I agree that splitting up Beam into separate repositories would cause >>> more pain than gain. >>> >>> To a large degree we already have independent modules, e.g. runners/* or >>> sdks/*. Although this is not the case for the core. It would be >>> desirable to break it up further. >>> >> >> Think this part is ok for everyone. >> >> >>> >>> > possibly even with their own build system (unified only through a >>> > top-level "build everything" script that descends into each subdir and >>> > runs the appropriate command). >>> >>> This is almost what we have. Yes, there are some dependencies on the >>> Beam Gradle Plugin, but even if we had completely independent build >>> directories, you'd still want to have a shared config/tasks across the >>> projects (which might bring you back to a setup similar to what we have). >>> >>> One of the pain points seems to be the portability which "polluted" some >>> parts of the project (e.g. legacy Runners). As mentioned in this thread >>> that could have been solved with an abstraction. But the lack of >>> abstraction also forced us to adopt the portable pipeline code quicker. >>> >> >> Not at all. Assume we have a full build which is doing portability then 3 >> concurrent builds (go, python, java) >> then we have "current step" in the CI but the dev are never affected by >> that and the build does not mess up their machines as well. >> >> Today the main blocker is that default "profile" (script) is not matching >> dev persona and therefore there is no real hope to have external >> contributions >> outside google related guys as mentionned by previous ficgures which is >> sad for a project promishing unification and work between communities IMHO. >> >> >>> >>> -Max >>> >>> On 10.10.18 10:51, Romain Manni-Bucau wrote: >>> > Yep for the split >>> > >>> > For the clean point it is quite linked to the build tools and fake env >>> > for not native modules for the build tool (go for gradle which is java >>> > first for instance). This is why having a real build which is natural >>> > per language would be beneficial IMO. >>> > >>> > Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré <[email protected] >>> > <mailto:[email protected]>> a écrit : >>> > >>> > Correct, it's more "module splitting" than repositories indeed. >>> > >>> > Regards >>> > JB >>> > >>> > On 10/10/2018 10:35, Robert Bradshaw wrote: >>> > > Gotcha. So this is more about dividing the code (particularly >>> > core) into >>> > > finer modules, rather than splitting the modules into separate >>> > > repositories, right? >>> > > >>> > > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré >>> > <[email protected] <mailto:[email protected]> >>> > > <mailto:[email protected] <mailto:[email protected]>>> wrote: >>> > > >>> > > The purpose is that we have a monolithic core today mostly >>> > providing >>> > > abstract classes. >>> > > >>> > > The idea is to have something more API oriented with >>> > interface/SPI. >>> > > >>> > > Our users would then be able to pick the part of the core >>> > they want, >>> > > resulting with lighter artifacts, and for us, it gives a >>> more >>> > flexible >>> > > approach. >>> > > >>> > > Regards >>> > > JB >>> > > >>> > > On 10/10/2018 10:26, Robert Bradshaw wrote: >>> > > > My question was not whether we should split the repo, but >>> why? >>> > > (Dividing >>> > > > things into more (or fewer) modules withing a single repo >>> is a >>> > > separate >>> > > > question.) Maybe I'm just not following what you mean by >>> > "more API >>> > > > oriented." It would force stabler APIs. >>> > > > >>> > > > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré >>> > > <[email protected] <mailto:[email protected]> >>> > <mailto:[email protected] <mailto:[email protected]>> >>> > > > <mailto:[email protected] <mailto:[email protected]> >>> > <mailto:[email protected] <mailto:[email protected]>>>> wrote: >>> > > > >>> > > > Hi, >>> > > > >>> > > > +1, even I think we could split the core even deeper. >>> > > > >>> > > > I discussed with Luke and Reuven to introduce >>> core-sql, >>> > > core-schema, >>> > > > core-sdf, ... >>> > > > >>> > > > It's not a huge effort, and would allow us to move >>> > forward on >>> > > Beam "more >>> > > > API oriented" approach. >>> > > > >>> > > > Regards >>> > > > JB >>> > > > >>> > > > On 10/10/2018 10:12, Robert Bradshaw wrote: >>> > > > > Hi everyone, >>> > > > > >>> > > > > While IMHO it's too early to even be able to split >>> > the repo, >>> > > it's >>> > > > not to >>> > > > > early to talk about it, and I wanted to spin this >>> off to >>> > > keep the >>> > > > other >>> > > > > thread focused. >>> > > > > >>> > > > > In particular, I am trying to figure out exactly >>> what is >>> > > hoped to be >>> > > > > gained by splitting things up. In my experience, a >>> single >>> > > project that >>> > > > > spans multiple repos has always come with excessive >>> > overhead >>> > > and pain. >>> > > > > Of note, we recently merged the website and >>> > dataflow-worker >>> > > into the >>> > > > > main repo *exactly* to avoid this pain (though the >>> > latter was >>> > > > > particularly bad due to one of the repos being >>> private). >>> > > > > >>> > > > > If need be, I don't see any reason we can't have a >>> single >>> > > repo with >>> > > > > directories >>> > > > > >>> > > > > model/ >>> > > > > website/ >>> > > > > java/ >>> > > > > go/ >>> > > > > ... >>> > > > > >>> > > > > possibly even with their own build system (unified >>> only >>> > > through a >>> > > > > top-level "build everything" script that descends >>> > into each >>> > > subdir and >>> > > > > runs the appropriate command). I'm not saying we >>> > should do >>> > > this (there >>> > > > > is value in having a single consistent build system, >>> > etc.) >>> > > but it's >>> > > > > possible. We could probably even make separate >>> > releases out >>> > > of this >>> > > > > single repo (if we wanted, though given that our >>> > releases are >>> > > > time-based >>> > > > > rather than feature-based, I don't see much >>> advantage >>> > here). >>> > > > > >>> > > > > Also, there was the comment. >>> > > > > >>> > > > > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau >>> > > > > <[email protected] <mailto: >>> [email protected]> >>> > <mailto:[email protected] <mailto:[email protected]>> >>> > > <mailto:[email protected] <mailto:[email protected] >>> > >>> > <mailto:[email protected] <mailto:[email protected]>>> >>> > > > <mailto:[email protected] >>> > <mailto:[email protected]> <mailto:[email protected] >>> > <mailto:[email protected]>> >>> > > <mailto:[email protected] <mailto:[email protected] >>> > >>> > <mailto:[email protected] <mailto:[email protected]>>>>> >>> wrote: >>> > > > >> >>> > > > >> Side note: beam portability would be saner if added >>> > on top >>> > > of others >>> > > > > than the opposite which is done today. >>> > > > > >>> > > > > I think you brought this up before, Romain. I'm >>> still >>> > trying to >>> > > > wrap my >>> > > > > head around what you mean here. Could you elaborate >>> > what such a >>> > > > > structure would look like? >>> > > > >>> > > > -- >>> > > > Jean-Baptiste Onofré >>> > > > [email protected] <mailto:[email protected]> >>> > <mailto:[email protected] <mailto:[email protected]>> >>> > > <mailto:[email protected] <mailto:[email protected]> >>> > <mailto:[email protected] <mailto:[email protected]>>> >>> > > > http://blog.nanthrax.net >>> > > > Talend - http://www.talend.com >>> > > > >>> > > >>> > > -- >>> > > Jean-Baptiste Onofré >>> > > [email protected] <mailto:[email protected]> >>> > <mailto:[email protected] <mailto:[email protected]>> >>> > > http://blog.nanthrax.net >>> > > Talend - http://www.talend.com >>> > > >>> > >>> > -- >>> > Jean-Baptiste Onofré >>> > [email protected] <mailto:[email protected]> >>> > http://blog.nanthrax.net >>> > Talend - http://www.talend.com >>> > >>> >>
