I think we will get to a point where it makes sense for runners to
live in their own repositories, with their own release cadence, but
we're not at that point yet. One prerequisite is a stable API--we're
closing in on that with the portability protos, but many (java)
runners actually share the common runner core libraries and that is
even less set in stone.

On the other hand, taking responsibility for maintaining all runners
is not a tenable or scalable position for the Beam project. If a
runner is merged, it should be understood that it can be "un-merged"
if it causes a maintenance burden. A completely separate
project/repository makes this less messy.

On Thu, Mar 5, 2020 at 10:01 AM Kenneth Knowles <k...@apache.org> wrote:
>
> I agree with both of you, mostly :-)
>
> The monorepo approach doesn't work/scale well for shipped libraries (name a 
> Google library that silently just works and never causes any dependency 
> problems) and the pain we feel has been constant and increasing, but I don't 
> think we are at the breaking point.
>
> But Google's big monorepo [1] demonstrates similar benefits to what Kyle 
> describes. In the early stages the benefit of not having to think too hard 
> about build/test infra and share it everywhere is a big help, and it scales 
> well. Eventually, shipping test utility libraries and compliance suites can 
> be equivalent. And to your point - it is very helpful for users to know that 
> they can use CassandraIO with the other Beam artifacts. This is why Google 
> requires the whole big repo to depend on a single version of any 
> externally-controlled artifact. But, yes, as a consequence it is 
> preposterously difficult to stay up to date, since literally anything can 
> block progress. You need a unified escalation chain for that policy to make 
> sense. It is the definition of a healthy Apache project to *not* have that 
> (PMC is different).
>
> Independent dependencies, independent git histories, and independent release 
> cadence/process are all separate discussions.
>
> It is a broader question than this particular contribution, so let's merge 
> this runner before changing our whole way of doing things :-)
>
> Kenn
>
> [1] 
> https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext
>  (really quite a balanced analysis)
>
> On Wed, Mar 4, 2020 at 11:51 AM Kyle Weaver <kcwea...@google.com> wrote:
>>
>> > Should runners, current and future, be in the same repository as Beam
>> > core?
>>
>> In the distant past, runners lived in their own repositories, and then were 
>> donated to Beam. But Beam's current uber-repo setup allows a lot of 
>> convenience. For example, a ton of code (including core functionality and 
>> tests) is shared directly between runners, which is useful for keeping 
>> runners up to date and ensuring consistent behavior between them (in other 
>> words, maintainable and reliable).
>>
>> Generally, it is up to the authors of a particular Beam related 
>> project/subproject to decide whether to host their code in Beam or in a 
>> different repo, and up to the community to decide whether to take on the 
>> donation, as discussed in previous threads on the Twister2 runner. In this 
>> case, it seems there is agreement between the Twister2 runner authors and 
>> the community that the runner can be hosted in Beam proper.
>>
>> There are examples of successful independent Beam projects, such as 
>> Spotify's Scio, but having an independent project with its own releases 
>> requires a lot of dedicated resources, and the bar for entry for extending 
>> Beam should not be that high. All that's required of subproject authors is 
>> that they keep the subproject in step with Beam. If they can't maintain it 
>> any longer, the subproject can be allowed to bitrot without getting in 
>> anyone's way. On the other hand, I'm not sure of the details with Cassandra, 
>> but in general, a subproject should not have "the ability to block progress" 
>> just because it is contained in the Beam uber-repo.
>>
>> tl;dr Having an uber repo generally seems to work for Beam. Exceptions are 
>> few enough to be handled on a case-by-case basis.
>>
>> On Wed, Mar 4, 2020 at 11:12 AM Elliotte Rusty Harold <elh...@ibiblio.org> 
>> wrote:
>>>
>>> Generic question without commenting on Twister2 specifically:
>>>
>>> Should runners, current and future, be in the same repository as Beam
>>> core? Can or should they be completely separate products with their
>>> own release cycles?
>>>
>>> Generally, loose coupling leads to more maintainable, reliable
>>> projects. Specifically, Cassandra is holding back some other changes
>>> in Beam and I really wish it didn't have the ability to block
>>> progress. The more different runners we have in core, the worse this
>>> problem is likely to become.
>>>
>>>
>>> On Wed, Mar 4, 2020 at 2:03 PM Pulasthi Supun Wickramasinghe
>>> <pulasthi...@gmail.com> wrote:
>>> >
>>> > Hi
>>> >
>>> > I believe the pull request is pretty complete now with the help of 
>>> > Ismaël. Kenn, would you be able to take a look at it and suggest any 
>>> > changes if needed?. The build checks and validations tests are passing at 
>>> > the moment.  I will start working on the documentation that you mentioned 
>>> > in an earlier email separately.
>>> >
>>> > Best Regards,
>>> > Pulasthi
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Tue, Feb 18, 2020 at 1:45 PM Pulasthi Supun Wickramasinghe 
>>> > <pulasthi...@gmail.com> wrote:
>>> >>
>>> >> Hi All,
>>> >>
>>> >> I have created the initial pull request [1] to contribute the Twister2 
>>> >> Beam runner to the Apache Beam codebase. More information on Twister2 
>>> >> can be found here[2] and the Twister2 codebase is available here[3]. At 
>>> >> the moment only batch mode is supported in the runner, but we are 
>>> >> planning to add stream support and implement a portable runner for 
>>> >> Twister2 in the near future.
>>> >>
>>> >> As Kenn pointed out in an earlier email it would be great to have inputs 
>>> >> from the community regarding this contribution since it is a sizable 
>>> >> one. I am sure there are many improvements that can be done in the 
>>> >> contributed codebase with input from the community.
>>> >>
>>> >> [1] https://github.com/apache/beam/pull/10888
>>> >> [2] https://twister2.org/
>>> >> [3] https://github.com/DSC-SPIDAL/twister2
>>> >>
>>> >> Best Regards,
>>> >> Pulasthi
>>> >> --
>>> >> Pulasthi S. Wickramasinghe
>>> >> PhD Candidate  | Research Assistant
>>> >> School of Informatics and Computing | Digital Science Center
>>> >> Indiana University, Bloomington
>>> >> cell: 224-386-9035
>>> >
>>> >
>>> >
>>> > --
>>> > Pulasthi S. Wickramasinghe
>>> > PhD Candidate  | Research Assistant
>>> > School of Informatics and Computing | Digital Science Center
>>> > Indiana University, Bloomington
>>> > cell: 224-386-9035
>>>
>>>
>>>
>>> --
>>> Elliotte Rusty Harold
>>> elh...@ibiblio.org

Reply via email to