I think Robert's initial question needs to be focused on a particular split.

I agree that a "single project spanning multiple repos" does not make
sense. But separate projects in separate repos is pretty widely used :-). The
point of separate repos IMO would be to empower (and force) them to act as
separate projects.

Every monorepo I have worked in has struggled with modularity problems. But
conversely, a project with poor modularity can thrive in a monorepo because
it is feasible to make changes across all the bits that are tightly
coupled. Because it is a subtext whenever a Google employee talks about
monorepos, I want to call out that Google's uniquely massive and
interesting monorepo requires a tremendous amount of bespoke infrastructure
to manage coupling, testing, ownership, etc*. It is not analogous to a
large repo on GitHub.

So... which pieces are "not separate enough" and why and how do we want to
make them separate?

I can think of some candidates that could benefit from some kind of
"separateness":

 - IOs or collections of IOs: separate release cadence, only build on
stable SDK releases (potential for diamond dep problems)
 - Portability protos: forces them to be highly stable and forces runners
to adapt to major iterations
 - Language SDKs: easier to build a community of devs with a clearly
familiar project structure and toolchain

Maybe the kinds of separation that folks want does not have to be a
separate repo, as mentioned. But it is still important that most
infrastructure and UI is geared towards a certain scale of project (not
just repo): issue tracking, pull request management, mailing lists,
ownership, selective test execution, triaging test failures, etc.

At this point, I see strong arguments in both directions and think that a
specific proposal of a specific split at the right time deserves an
individualized discussion.

Kenn

*Other issues include governance and effectiveness for shipping
user-friendly libraries




On Wed, Oct 10, 2018 at 11:12 AM Ankur Goenka <[email protected]> wrote:

> Hi,
>
> I think the subtext here is that development is hard in general. I agree
> to it. And a major cause of it is diversity of languages, complexity of the
> project and legacy code.
> To alleviate language related issues, we are trying to have modular code
> which we already have to a certain extent.
> On the other hand tooling is still evolving and needs improvement. I also
> feel that tooling is a moving target and its good to keep on reevaluating
> it.
> Tooling is a problem for everyone (the whole community) and we are
> actively trying to solve it. Gradle is a big step towards it.
> I personally contribute to multiple languages. Many of the PR have changes
> spanning across languages and have to be merged as a whole. I personally
> feel that having a unified build system makes it easier to do the checks
> and make sure things work.
> Even after gradle, I am still able to setup intellij for Java, Pycharm for
> Python and GoLand for Go as I would have done earlier (before gradle). I am
> also able to run "python setup.py sdist" as I was able to do before gradle.
> Gradle is also acting as the top level task manager and most of the python
> tasks are just plain shell commands stitched together.
> The only real problem that I face in my setup is the vendored java jars
> which only impact java development.
> Probably documenting separate environment specific setup for each language
> is sufficient to address the issue.
>
> I also agree with Max that splitting the repo will cause more pain than
> gain.
>
> ~Ankur
>
>
>
> On Wed, Oct 10, 2018 at 7:56 AM Romain Manni-Bucau <[email protected]>
> wrote:
>
>>
>>
>>
>> Le mer. 10 oct. 2018 à 14:59, Maximilian Michels <[email protected]> a
>> écrit :
>>
>>> Hi,
>>>
>>> I agree that splitting up Beam into separate repositories would cause
>>> more pain than gain.
>>>
>>> To a large degree we already have independent modules, e.g. runners/* or
>>> sdks/*. Although this is not the case for the core. It would be
>>> desirable to break it up further.
>>>
>>
>> Think this part is ok for everyone.
>>
>>
>>>
>>>  > possibly even with their own build system (unified only through a
>>>  > top-level "build everything" script that descends into each subdir and
>>>  > runs the appropriate command).
>>>
>>> This is almost what we have. Yes, there are some dependencies on the
>>> Beam Gradle Plugin, but even if we had completely independent build
>>> directories, you'd still want to have a shared config/tasks across the
>>> projects (which might bring you back to a setup similar to what we have).
>>>
>>> One of the pain points seems to be the portability which "polluted" some
>>> parts of the project (e.g. legacy Runners). As mentioned in this thread
>>> that could have been solved with an abstraction. But the lack of
>>> abstraction also forced us to adopt the portable pipeline code quicker.
>>>
>>
>> Not at all. Assume we have a full build which is doing portability then 3
>> concurrent builds (go, python, java)
>> then we have "current step" in the CI but the dev are never affected by
>> that and the build does not mess up their machines as well.
>>
>> Today the main blocker is that default "profile" (script) is not matching
>> dev persona and therefore there is no real hope to have external
>> contributions
>> outside google related guys as mentionned by previous ficgures which is
>> sad for a project promishing unification and work between communities IMHO.
>>
>>
>>>
>>> -Max
>>>
>>> On 10.10.18 10:51, Romain Manni-Bucau wrote:
>>> > Yep for the split
>>> >
>>> > For the clean point it is quite linked to the build tools and fake env
>>> > for not native modules for the build tool (go for gradle which is java
>>> > first for instance). This is why having a real build which is natural
>>> > per language would be beneficial IMO.
>>> >
>>> > Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré <[email protected]
>>> > <mailto:[email protected]>> a écrit :
>>> >
>>> >     Correct, it's more "module splitting" than repositories indeed.
>>> >
>>> >     Regards
>>> >     JB
>>> >
>>> >     On 10/10/2018 10:35, Robert Bradshaw wrote:
>>> >      > Gotcha. So this is more about dividing the code (particularly
>>> >     core) into
>>> >      > finer modules, rather than splitting the modules into separate
>>> >      > repositories, right?
>>> >      >
>>> >      > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré
>>> >     <[email protected] <mailto:[email protected]>
>>> >      > <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>> >      >
>>> >      >     The purpose is that we have a monolithic core today mostly
>>> >     providing
>>> >      >     abstract classes.
>>> >      >
>>> >      >     The idea is to have something more API oriented with
>>> >     interface/SPI.
>>> >      >
>>> >      >     Our users would then be able to pick the part of the core
>>> >     they want,
>>> >      >     resulting with lighter artifacts, and for us, it gives a
>>> more
>>> >     flexible
>>> >      >     approach.
>>> >      >
>>> >      >     Regards
>>> >      >     JB
>>> >      >
>>> >      >     On 10/10/2018 10:26, Robert Bradshaw wrote:
>>> >      >     > My question was not whether we should split the repo, but
>>> why?
>>> >      >     (Dividing
>>> >      >     > things into more (or fewer) modules withing a single repo
>>> is a
>>> >      >     separate
>>> >      >     > question.) Maybe I'm just not following what you mean by
>>> >     "more API
>>> >      >     > oriented." It would force stabler APIs.
>>> >      >     >
>>> >      >     > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
>>> >      >     <[email protected] <mailto:[email protected]>
>>> >     <mailto:[email protected] <mailto:[email protected]>>
>>> >      >     > <mailto:[email protected] <mailto:[email protected]>
>>> >     <mailto:[email protected] <mailto:[email protected]>>>> wrote:
>>> >      >     >
>>> >      >     >     Hi,
>>> >      >     >
>>> >      >     >     +1, even I think we could split the core even deeper.
>>> >      >     >
>>> >      >     >     I discussed with Luke and Reuven to introduce
>>> core-sql,
>>> >      >     core-schema,
>>> >      >     >     core-sdf, ...
>>> >      >     >
>>> >      >     >     It's not a huge effort, and would allow us to move
>>> >     forward on
>>> >      >     Beam "more
>>> >      >     >     API oriented" approach.
>>> >      >     >
>>> >      >     >     Regards
>>> >      >     >     JB
>>> >      >     >
>>> >      >     >     On 10/10/2018 10:12, Robert Bradshaw wrote:
>>> >      >     >     > Hi everyone,
>>> >      >     >     >
>>> >      >     >     > While IMHO it's too early to even be able to split
>>> >     the repo,
>>> >      >     it's
>>> >      >     >     not to
>>> >      >     >     > early to talk about it, and I wanted to spin this
>>> off to
>>> >      >     keep the
>>> >      >     >     other
>>> >      >     >     > thread focused.
>>> >      >     >     >
>>> >      >     >     > In particular, I am trying to figure out exactly
>>> what is
>>> >      >     hoped to be
>>> >      >     >     > gained by splitting things up. In my experience, a
>>> single
>>> >      >     project that
>>> >      >     >     > spans multiple repos has always come with excessive
>>> >     overhead
>>> >      >     and pain.
>>> >      >     >     > Of note, we recently merged the website and
>>> >     dataflow-worker
>>> >      >     into the
>>> >      >     >     > main repo *exactly* to avoid this pain (though the
>>> >     latter was
>>> >      >     >     > particularly bad due to one of the repos being
>>> private).
>>> >      >     >     >
>>> >      >     >     > If need be, I don't see any reason we can't have a
>>> single
>>> >      >     repo with
>>> >      >     >     > directories
>>> >      >     >     >
>>> >      >     >     > model/
>>> >      >     >     > website/
>>> >      >     >     > java/
>>> >      >     >     > go/
>>> >      >     >     > ...
>>> >      >     >     >
>>> >      >     >     > possibly even with their own build system (unified
>>> only
>>> >      >     through a
>>> >      >     >     > top-level "build everything" script that descends
>>> >     into each
>>> >      >     subdir and
>>> >      >     >     > runs the appropriate command). I'm not saying we
>>> >     should do
>>> >      >     this (there
>>> >      >     >     > is value in having a single consistent build system,
>>> >     etc.)
>>> >      >     but it's
>>> >      >     >     > possible. We could probably even make separate
>>> >     releases out
>>> >      >     of this
>>> >      >     >     > single repo (if we wanted, though given that our
>>> >     releases are
>>> >      >     >     time-based
>>> >      >     >     > rather than feature-based, I don't see much
>>> advantage
>>> >     here).
>>> >      >     >     >
>>> >      >     >     > Also, there was the comment.
>>> >      >     >     >
>>> >      >     >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>>> >      >     >     > <[email protected] <mailto:
>>> [email protected]>
>>> >     <mailto:[email protected] <mailto:[email protected]>>
>>> >      >     <mailto:[email protected] <mailto:[email protected]
>>> >
>>> >     <mailto:[email protected] <mailto:[email protected]>>>
>>> >      >     >     <mailto:[email protected]
>>> >     <mailto:[email protected]> <mailto:[email protected]
>>> >     <mailto:[email protected]>>
>>> >      >     <mailto:[email protected] <mailto:[email protected]
>>> >
>>> >     <mailto:[email protected] <mailto:[email protected]>>>>>
>>> wrote:
>>> >      >     >     >>
>>> >      >     >     >> Side note: beam portability would be saner if added
>>> >     on top
>>> >      >     of others
>>> >      >     >     > than the opposite which is done today.
>>> >      >     >     >
>>> >      >     >     > I think you brought this up before, Romain. I'm
>>> still
>>> >     trying to
>>> >      >     >     wrap my
>>> >      >     >     > head around what you mean here. Could you elaborate
>>> >     what such a
>>> >      >     >     > structure would look like?
>>> >      >     >
>>> >      >     >     --
>>> >      >     >     Jean-Baptiste Onofré
>>> >      >     > [email protected] <mailto:[email protected]>
>>> >     <mailto:[email protected] <mailto:[email protected]>>
>>> >      >     <mailto:[email protected] <mailto:[email protected]>
>>> >     <mailto:[email protected] <mailto:[email protected]>>>
>>> >      >     > http://blog.nanthrax.net
>>> >      >     >     Talend - http://www.talend.com
>>> >      >     >
>>> >      >
>>> >      >     --
>>> >      >     Jean-Baptiste Onofré
>>> >      > [email protected] <mailto:[email protected]>
>>> >     <mailto:[email protected] <mailto:[email protected]>>
>>> >      > http://blog.nanthrax.net
>>> >      >     Talend - http://www.talend.com
>>> >      >
>>> >
>>> >     --
>>> >     Jean-Baptiste Onofré
>>> >     [email protected] <mailto:[email protected]>
>>> >     http://blog.nanthrax.net
>>> >     Talend - http://www.talend.com
>>> >
>>>
>>

Reply via email to