I agree with both of you, mostly :-)

The monorepo approach doesn't work/scale well for shipped libraries (name a
Google library that silently just works and never causes any dependency
problems) and the pain we feel has been constant and increasing, but I
don't think we are at the breaking point.

But Google's big monorepo [1] demonstrates similar benefits to what Kyle
describes. In the early stages the benefit of not having to think too hard
about build/test infra and share it everywhere is a big help, and it scales
well. Eventually, shipping test utility libraries and compliance suites can
be equivalent. And to your point - it is very helpful for users to know
that they can use CassandraIO with the other Beam artifacts. This is why
Google requires the whole big repo to depend on a single version of any
externally-controlled artifact. But, yes, as a consequence it is
preposterously difficult to stay up to date, since literally anything can
block progress. You need a unified escalation chain for that policy to make
sense. It is the definition of a healthy Apache project to *not* have that
(PMC is different).

Independent dependencies, independent git histories, and independent
release cadence/process are all separate discussions.

It is a broader question than this particular contribution, so let's merge
this runner before changing our whole way of doing things :-)

Kenn

[1]
https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext
(really
quite a balanced analysis)

On Wed, Mar 4, 2020 at 11:51 AM Kyle Weaver <[email protected]> wrote:

> > Should runners, current and future, be in the same repository as Beam
> > core?
>
> In the distant past, runners lived in their own repositories, and then
> were donated to Beam. But Beam's current uber-repo setup allows a lot of
> convenience. For example, a ton of code (including core functionality and
> tests) is shared directly between runners, which is useful for keeping
> runners up to date and ensuring consistent behavior between them (in other
> words, maintainable and reliable).
>
> Generally, it is up to the authors of a particular Beam related
> project/subproject to decide whether to host their code in Beam or in a
> different repo, and up to the community to decide whether to take on the
> donation, as discussed in previous threads on the Twister2 runner. In this
> case, it seems there is agreement between the Twister2 runner authors and
> the community that the runner can be hosted in Beam proper.
>
> There are examples of successful independent Beam projects, such as
> Spotify's Scio, but having an independent project with its own releases
> requires a lot of dedicated resources, and the bar for entry for extending
> Beam should not be that high. All that's required of subproject authors is
> that they keep the subproject in step with Beam. If they can't maintain it
> any longer, the subproject can be allowed to bitrot without getting in
> anyone's way. On the other hand, I'm not sure of the details with
> Cassandra, but in general, a subproject should not have "the ability to
> block progress" just because it is contained in the Beam uber-repo.
>
> tl;dr Having an uber repo generally seems to work for Beam. Exceptions are
> few enough to be handled on a case-by-case basis.
>
> On Wed, Mar 4, 2020 at 11:12 AM Elliotte Rusty Harold <[email protected]>
> wrote:
>
>> Generic question without commenting on Twister2 specifically:
>>
>> Should runners, current and future, be in the same repository as Beam
>> core? Can or should they be completely separate products with their
>> own release cycles?
>>
>> Generally, loose coupling leads to more maintainable, reliable
>> projects. Specifically, Cassandra is holding back some other changes
>> in Beam and I really wish it didn't have the ability to block
>> progress. The more different runners we have in core, the worse this
>> problem is likely to become.
>>
>>
>> On Wed, Mar 4, 2020 at 2:03 PM Pulasthi Supun Wickramasinghe
>> <[email protected]> wrote:
>> >
>> > Hi
>> >
>> > I believe the pull request is pretty complete now with the help of
>> Ismaël. Kenn, would you be able to take a look at it and suggest any
>> changes if needed?. The build checks and validations tests are passing at
>> the moment.  I will start working on the documentation that you mentioned
>> in an earlier email separately.
>> >
>> > Best Regards,
>> > Pulasthi
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Feb 18, 2020 at 1:45 PM Pulasthi Supun Wickramasinghe <
>> [email protected]> wrote:
>> >>
>> >> Hi All,
>> >>
>> >> I have created the initial pull request [1] to contribute the Twister2
>> Beam runner to the Apache Beam codebase. More information on Twister2 can
>> be found here[2] and the Twister2 codebase is available here[3]. At the
>> moment only batch mode is supported in the runner, but we are planning to
>> add stream support and implement a portable runner for Twister2 in the near
>> future.
>> >>
>> >> As Kenn pointed out in an earlier email it would be great to have
>> inputs from the community regarding this contribution since it is a sizable
>> one. I am sure there are many improvements that can be done in the
>> contributed codebase with input from the community.
>> >>
>> >> [1] https://github.com/apache/beam/pull/10888
>> >> [2] https://twister2.org/
>> >> [3] https://github.com/DSC-SPIDAL/twister2
>> >>
>> >> Best Regards,
>> >> Pulasthi
>> >> --
>> >> Pulasthi S. Wickramasinghe
>> >> PhD Candidate  | Research Assistant
>> >> School of Informatics and Computing | Digital Science Center
>> >> Indiana University, Bloomington
>> >> cell: 224-386-9035 <(224)%20386-9035>
>> >
>> >
>> >
>> > --
>> > Pulasthi S. Wickramasinghe
>> > PhD Candidate  | Research Assistant
>> > School of Informatics and Computing | Digital Science Center
>> > Indiana University, Bloomington
>> > cell: 224-386-9035 <(224)%20386-9035>
>>
>>
>>
>> --
>> Elliotte Rusty Harold
>> [email protected]
>>
>

Reply via email to