Good points Kenn. I think we mostly agree on what has been discussed in this thread the pros/cons of having runners on our repository, but this is probably not the best moment in time to change any policy in that aspect.
So if nobody objects I think we can proceed. I am OOO this week so with less time to continue with the code review, but I will be back to finish the review and hopefully finally get this merged with Pulasthi next week (sorry for the delay). > (don't wait for me on code review - if Ismaël said it is good, then it is > good.) Thanks for your confidence. Twister2 runners looks good so far, but I will confirm 100% next week :) In the meantime if someone has some extra cycles to take a look extra feedback is always welcome. On Mon, Mar 9, 2020 at 5:50 AM Kenneth Knowles <[email protected]> wrote: > > I haven't heard anyone suggest that we need a vote. I haven't heard anyone > object to this being merged to master. Some time ago, we mostly decided to > favor master instead of branches, because it is so much smoother for > contributors and users. > > So I am poking this thread one last time and otherwise I would consider it > consensus that once code review is done the runner is a part of Beam > (experimental!). > > (don't wait for me on code review - if Ismaël said it is good, then it is > good.) > > Kenn > > On Fri, Mar 6, 2020 at 7:47 AM Pulasthi Supun Wickramasinghe > <[email protected]> wrote: >> >> I understand that the discussion is on a more broad level than the Twister2 >> runner. From my experience developing the runner the main advantage of being >> inside the beam project was the easy access to the wide range of tests and >> other core/utility code as Kyle pointed out. Unmerging runners that are not >> properly maintained and updated would be the most logical path to follow >> since the internals of the runners are only well understood by developers of >> that particular project. It would be unreasonable to expect the Beam >> community to maintain them. And since the runners do not alter the core >> API's I assume they would be easy to unmerge if the need arises. >> >> Talking specifically about Twister2 runner, we hope to continue developing >> the runner in the future to add both streaming capability and develop a >> portable runner as well. The team behind Twister2 is working towards the >> goal to get the project into Apache Incubator in the near future (Hopefully >> to submit the proposal in the next couple of months). >> >> Best Regards, >> Pulasthi >> >> >> >> On Thu, Mar 5, 2020 at 6:56 PM Robert Bradshaw <[email protected]> wrote: >>> >>> I think we will get to a point where it makes sense for runners to >>> live in their own repositories, with their own release cadence, but >>> we're not at that point yet. One prerequisite is a stable API--we're >>> closing in on that with the portability protos, but many (java) >>> runners actually share the common runner core libraries and that is >>> even less set in stone. >>> >>> On the other hand, taking responsibility for maintaining all runners >>> is not a tenable or scalable position for the Beam project. If a >>> runner is merged, it should be understood that it can be "un-merged" >>> if it causes a maintenance burden. A completely separate >>> project/repository makes this less messy. >>> >>> On Thu, Mar 5, 2020 at 10:01 AM Kenneth Knowles <[email protected]> wrote: >>> > >>> > I agree with both of you, mostly :-) >>> > >>> > The monorepo approach doesn't work/scale well for shipped libraries (name >>> > a Google library that silently just works and never causes any dependency >>> > problems) and the pain we feel has been constant and increasing, but I >>> > don't think we are at the breaking point. >>> > >>> > But Google's big monorepo [1] demonstrates similar benefits to what Kyle >>> > describes. In the early stages the benefit of not having to think too >>> > hard about build/test infra and share it everywhere is a big help, and it >>> > scales well. Eventually, shipping test utility libraries and compliance >>> > suites can be equivalent. And to your point - it is very helpful for >>> > users to know that they can use CassandraIO with the other Beam >>> > artifacts. This is why Google requires the whole big repo to depend on a >>> > single version of any externally-controlled artifact. But, yes, as a >>> > consequence it is preposterously difficult to stay up to date, since >>> > literally anything can block progress. You need a unified escalation >>> > chain for that policy to make sense. It is the definition of a healthy >>> > Apache project to *not* have that (PMC is different). >>> > >>> > Independent dependencies, independent git histories, and independent >>> > release cadence/process are all separate discussions. >>> > >>> > It is a broader question than this particular contribution, so let's >>> > merge this runner before changing our whole way of doing things :-) >>> > >>> > Kenn >>> > >>> > [1] >>> > https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext >>> > (really quite a balanced analysis) >>> > >>> > On Wed, Mar 4, 2020 at 11:51 AM Kyle Weaver <[email protected]> wrote: >>> >> >>> >> > Should runners, current and future, be in the same repository as Beam >>> >> > core? >>> >> >>> >> In the distant past, runners lived in their own repositories, and then >>> >> were donated to Beam. But Beam's current uber-repo setup allows a lot of >>> >> convenience. For example, a ton of code (including core functionality >>> >> and tests) is shared directly between runners, which is useful for >>> >> keeping runners up to date and ensuring consistent behavior between them >>> >> (in other words, maintainable and reliable). >>> >> >>> >> Generally, it is up to the authors of a particular Beam related >>> >> project/subproject to decide whether to host their code in Beam or in a >>> >> different repo, and up to the community to decide whether to take on the >>> >> donation, as discussed in previous threads on the Twister2 runner. In >>> >> this case, it seems there is agreement between the Twister2 runner >>> >> authors and the community that the runner can be hosted in Beam proper. >>> >> >>> >> There are examples of successful independent Beam projects, such as >>> >> Spotify's Scio, but having an independent project with its own releases >>> >> requires a lot of dedicated resources, and the bar for entry for >>> >> extending Beam should not be that high. All that's required of >>> >> subproject authors is that they keep the subproject in step with Beam. >>> >> If they can't maintain it any longer, the subproject can be allowed to >>> >> bitrot without getting in anyone's way. On the other hand, I'm not sure >>> >> of the details with Cassandra, but in general, a subproject should not >>> >> have "the ability to block progress" just because it is contained in the >>> >> Beam uber-repo. >>> >> >>> >> tl;dr Having an uber repo generally seems to work for Beam. Exceptions >>> >> are few enough to be handled on a case-by-case basis. >>> >> >>> >> On Wed, Mar 4, 2020 at 11:12 AM Elliotte Rusty Harold >>> >> <[email protected]> wrote: >>> >>> >>> >>> Generic question without commenting on Twister2 specifically: >>> >>> >>> >>> Should runners, current and future, be in the same repository as Beam >>> >>> core? Can or should they be completely separate products with their >>> >>> own release cycles? >>> >>> >>> >>> Generally, loose coupling leads to more maintainable, reliable >>> >>> projects. Specifically, Cassandra is holding back some other changes >>> >>> in Beam and I really wish it didn't have the ability to block >>> >>> progress. The more different runners we have in core, the worse this >>> >>> problem is likely to become. >>> >>> >>> >>> >>> >>> On Wed, Mar 4, 2020 at 2:03 PM Pulasthi Supun Wickramasinghe >>> >>> <[email protected]> wrote: >>> >>> > >>> >>> > Hi >>> >>> > >>> >>> > I believe the pull request is pretty complete now with the help of >>> >>> > Ismaël. Kenn, would you be able to take a look at it and suggest any >>> >>> > changes if needed?. The build checks and validations tests are >>> >>> > passing at the moment. I will start working on the documentation >>> >>> > that you mentioned in an earlier email separately. >>> >>> > >>> >>> > Best Regards, >>> >>> > Pulasthi >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >>> > On Tue, Feb 18, 2020 at 1:45 PM Pulasthi Supun Wickramasinghe >>> >>> > <[email protected]> wrote: >>> >>> >> >>> >>> >> Hi All, >>> >>> >> >>> >>> >> I have created the initial pull request [1] to contribute the >>> >>> >> Twister2 Beam runner to the Apache Beam codebase. More information >>> >>> >> on Twister2 can be found here[2] and the Twister2 codebase is >>> >>> >> available here[3]. At the moment only batch mode is supported in the >>> >>> >> runner, but we are planning to add stream support and implement a >>> >>> >> portable runner for Twister2 in the near future. >>> >>> >> >>> >>> >> As Kenn pointed out in an earlier email it would be great to have >>> >>> >> inputs from the community regarding this contribution since it is a >>> >>> >> sizable one. I am sure there are many improvements that can be done >>> >>> >> in the contributed codebase with input from the community. >>> >>> >> >>> >>> >> [1] https://github.com/apache/beam/pull/10888 >>> >>> >> [2] https://twister2.org/ >>> >>> >> [3] https://github.com/DSC-SPIDAL/twister2 >>> >>> >> >>> >>> >> Best Regards, >>> >>> >> Pulasthi >>> >>> >> -- >>> >>> >> Pulasthi S. Wickramasinghe >>> >>> >> PhD Candidate | Research Assistant >>> >>> >> School of Informatics and Computing | Digital Science Center >>> >>> >> Indiana University, Bloomington >>> >>> >> cell: 224-386-9035 >>> >>> > >>> >>> > >>> >>> > >>> >>> > -- >>> >>> > Pulasthi S. Wickramasinghe >>> >>> > PhD Candidate | Research Assistant >>> >>> > School of Informatics and Computing | Digital Science Center >>> >>> > Indiana University, Bloomington >>> >>> > cell: 224-386-9035 >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Elliotte Rusty Harold >>> >>> [email protected] >> >> >> >> -- >> Pulasthi S. Wickramasinghe >> PhD Candidate | Research Assistant >> School of Informatics and Computing | Digital Science Center >> Indiana University, Bloomington >> cell: 224-386-9035
