One last thing, for any runner after this one... wouldn't it be a good acceptance criteria to only accept portable implementations anymore?
_/ _/ Alex Van Boxel On Mon, Mar 9, 2020 at 10:42 PM Ismaël Mejía <[email protected]> wrote: > Good points Kenn. I think we mostly agree on what has been discussed in > this > thread the pros/cons of having runners on our repository, but this is > probably > not the best moment in time to change any policy in that aspect. > > So if nobody objects I think we can proceed. I am OOO this week so with > less > time to continue with the code review, but I will be back to finish the > review > and hopefully finally get this merged with Pulasthi next week (sorry for > the > delay). > > > (don't wait for me on code review - if Ismaël said it is good, then it is > > good.) > > Thanks for your confidence. Twister2 runners looks good so far, but I will > confirm 100% next week :) In the meantime if someone has some extra cycles > to > take a look extra feedback is always welcome. > > On Mon, Mar 9, 2020 at 5:50 AM Kenneth Knowles <[email protected]> wrote: > > > > I haven't heard anyone suggest that we need a vote. I haven't heard > anyone object to this being merged to master. Some time ago, we mostly > decided to favor master instead of branches, because it is so much smoother > for contributors and users. > > > > So I am poking this thread one last time and otherwise I would consider > it consensus that once code review is done the runner is a part of Beam > (experimental!). > > > > (don't wait for me on code review - if Ismaël said it is good, then it > is good.) > > > > Kenn > > > > On Fri, Mar 6, 2020 at 7:47 AM Pulasthi Supun Wickramasinghe < > [email protected]> wrote: > >> > >> I understand that the discussion is on a more broad level than the > Twister2 runner. From my experience developing the runner the main > advantage of being inside the beam project was the easy access to the wide > range of tests and other core/utility code as Kyle pointed out. Unmerging > runners that are not properly maintained and updated would be the most > logical path to follow since the internals of the runners are only well > understood by developers of that particular project. It would be > unreasonable to expect the Beam community to maintain them. And since the > runners do not alter the core API's I assume they would be easy to unmerge > if the need arises. > >> > >> Talking specifically about Twister2 runner, we hope to continue > developing the runner in the future to add both streaming capability and > develop a portable runner as well. The team behind Twister2 is working > towards the goal to get the project into Apache Incubator in the near > future (Hopefully to submit the proposal in the next couple of months). > >> > >> Best Regards, > >> Pulasthi > >> > >> > >> > >> On Thu, Mar 5, 2020 at 6:56 PM Robert Bradshaw <[email protected]> > wrote: > >>> > >>> I think we will get to a point where it makes sense for runners to > >>> live in their own repositories, with their own release cadence, but > >>> we're not at that point yet. One prerequisite is a stable API--we're > >>> closing in on that with the portability protos, but many (java) > >>> runners actually share the common runner core libraries and that is > >>> even less set in stone. > >>> > >>> On the other hand, taking responsibility for maintaining all runners > >>> is not a tenable or scalable position for the Beam project. If a > >>> runner is merged, it should be understood that it can be "un-merged" > >>> if it causes a maintenance burden. A completely separate > >>> project/repository makes this less messy. > >>> > >>> On Thu, Mar 5, 2020 at 10:01 AM Kenneth Knowles <[email protected]> > wrote: > >>> > > >>> > I agree with both of you, mostly :-) > >>> > > >>> > The monorepo approach doesn't work/scale well for shipped libraries > (name a Google library that silently just works and never causes any > dependency problems) and the pain we feel has been constant and increasing, > but I don't think we are at the breaking point. > >>> > > >>> > But Google's big monorepo [1] demonstrates similar benefits to what > Kyle describes. In the early stages the benefit of not having to think too > hard about build/test infra and share it everywhere is a big help, and it > scales well. Eventually, shipping test utility libraries and compliance > suites can be equivalent. And to your point - it is very helpful for users > to know that they can use CassandraIO with the other Beam artifacts. This > is why Google requires the whole big repo to depend on a single version of > any externally-controlled artifact. But, yes, as a consequence it is > preposterously difficult to stay up to date, since literally anything can > block progress. You need a unified escalation chain for that policy to make > sense. It is the definition of a healthy Apache project to *not* have that > (PMC is different). > >>> > > >>> > Independent dependencies, independent git histories, and independent > release cadence/process are all separate discussions. > >>> > > >>> > It is a broader question than this particular contribution, so let's > merge this runner before changing our whole way of doing things :-) > >>> > > >>> > Kenn > >>> > > >>> > [1] > https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext > (really quite a balanced analysis) > >>> > > >>> > On Wed, Mar 4, 2020 at 11:51 AM Kyle Weaver <[email protected]> > wrote: > >>> >> > >>> >> > Should runners, current and future, be in the same repository as > Beam > >>> >> > core? > >>> >> > >>> >> In the distant past, runners lived in their own repositories, and > then were donated to Beam. But Beam's current uber-repo setup allows a lot > of convenience. For example, a ton of code (including core functionality > and tests) is shared directly between runners, which is useful for keeping > runners up to date and ensuring consistent behavior between them (in other > words, maintainable and reliable). > >>> >> > >>> >> Generally, it is up to the authors of a particular Beam related > project/subproject to decide whether to host their code in Beam or in a > different repo, and up to the community to decide whether to take on the > donation, as discussed in previous threads on the Twister2 runner. In this > case, it seems there is agreement between the Twister2 runner authors and > the community that the runner can be hosted in Beam proper. > >>> >> > >>> >> There are examples of successful independent Beam projects, such as > Spotify's Scio, but having an independent project with its own releases > requires a lot of dedicated resources, and the bar for entry for extending > Beam should not be that high. All that's required of subproject authors is > that they keep the subproject in step with Beam. If they can't maintain it > any longer, the subproject can be allowed to bitrot without getting in > anyone's way. On the other hand, I'm not sure of the details with > Cassandra, but in general, a subproject should not have "the ability to > block progress" just because it is contained in the Beam uber-repo. > >>> >> > >>> >> tl;dr Having an uber repo generally seems to work for Beam. > Exceptions are few enough to be handled on a case-by-case basis. > >>> >> > >>> >> On Wed, Mar 4, 2020 at 11:12 AM Elliotte Rusty Harold < > [email protected]> wrote: > >>> >>> > >>> >>> Generic question without commenting on Twister2 specifically: > >>> >>> > >>> >>> Should runners, current and future, be in the same repository as > Beam > >>> >>> core? Can or should they be completely separate products with their > >>> >>> own release cycles? > >>> >>> > >>> >>> Generally, loose coupling leads to more maintainable, reliable > >>> >>> projects. Specifically, Cassandra is holding back some other > changes > >>> >>> in Beam and I really wish it didn't have the ability to block > >>> >>> progress. The more different runners we have in core, the worse > this > >>> >>> problem is likely to become. > >>> >>> > >>> >>> > >>> >>> On Wed, Mar 4, 2020 at 2:03 PM Pulasthi Supun Wickramasinghe > >>> >>> <[email protected]> wrote: > >>> >>> > > >>> >>> > Hi > >>> >>> > > >>> >>> > I believe the pull request is pretty complete now with the help > of Ismaël. Kenn, would you be able to take a look at it and suggest any > changes if needed?. The build checks and validations tests are passing at > the moment. I will start working on the documentation that you mentioned > in an earlier email separately. > >>> >>> > > >>> >>> > Best Regards, > >>> >>> > Pulasthi > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > On Tue, Feb 18, 2020 at 1:45 PM Pulasthi Supun Wickramasinghe < > [email protected]> wrote: > >>> >>> >> > >>> >>> >> Hi All, > >>> >>> >> > >>> >>> >> I have created the initial pull request [1] to contribute the > Twister2 Beam runner to the Apache Beam codebase. More information on > Twister2 can be found here[2] and the Twister2 codebase is available > here[3]. At the moment only batch mode is supported in the runner, but we > are planning to add stream support and implement a portable runner for > Twister2 in the near future. > >>> >>> >> > >>> >>> >> As Kenn pointed out in an earlier email it would be great to > have inputs from the community regarding this contribution since it is a > sizable one. I am sure there are many improvements that can be done in the > contributed codebase with input from the community. > >>> >>> >> > >>> >>> >> [1] https://github.com/apache/beam/pull/10888 > >>> >>> >> [2] https://twister2.org/ > >>> >>> >> [3] https://github.com/DSC-SPIDAL/twister2 > >>> >>> >> > >>> >>> >> Best Regards, > >>> >>> >> Pulasthi > >>> >>> >> -- > >>> >>> >> Pulasthi S. Wickramasinghe > >>> >>> >> PhD Candidate | Research Assistant > >>> >>> >> School of Informatics and Computing | Digital Science Center > >>> >>> >> Indiana University, Bloomington > >>> >>> >> cell: 224-386-9035 > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > -- > >>> >>> > Pulasthi S. Wickramasinghe > >>> >>> > PhD Candidate | Research Assistant > >>> >>> > School of Informatics and Computing | Digital Science Center > >>> >>> > Indiana University, Bloomington > >>> >>> > cell: 224-386-9035 > >>> >>> > >>> >>> > >>> >>> > >>> >>> -- > >>> >>> Elliotte Rusty Harold > >>> >>> [email protected] > >> > >> > >> > >> -- > >> Pulasthi S. Wickramasinghe > >> PhD Candidate | Research Assistant > >> School of Informatics and Computing | Digital Science Center > >> Indiana University, Bloomington > >> cell: 224-386-9035 >
