I agree. Borrowing the mutation detection from the direct runner as an intermediate point sounds like a good idea.
On Mon, Dec 21, 2020 at 8:57 AM Kenneth Knowles <[email protected]> wrote: > I really think we should make a plan to make this the default. If you test > with the DirectRunner it will do mutation checking and catch pipelines that > depend on the runner cloning every element. (also the DirectRunner doesn't > clone). Since the cloning is similar in cost to the mutation detection, > could we actually add some mutation detection to FlinkRunner pipelines and > also directly warn if a pipeline is depending on it? > > Kenn > > On Mon, Dec 21, 2020 at 5:04 AM Teodor Spæren <[email protected]> > wrote: > >> Hey! My option is not default as of now, since it can break pipelines >> which rely on the faulty flink implementation. I'm creating my own >> benchmarks locally and will run against those, but the idea of adding it >> to the official benchmark runs sounds interesting, thanks for bringing >> it up! >> >> Teodor >> >> On Tue, Dec 15, 2020 at 06:51:38PM -0800, Ahmet Altay wrote: >> >Hi Teodor, >> > >> >Thank you for working on this. If I remember correctly, there were some >> >opportunities to improve in the previous paper (e.g. not focusing >> >deprecated runners, long running benchmarks, varying data sizes). And I >> am >> >excited that you are keeping the community as part of your research >> process >> >and we will be happy to help you where we can. >> > >> >Related to your question. Was the new option used by default? If that >> >is the case you will probably see its impact on the metrics dashboard >> [1]. >> >And if it is not on by default, you can add your variant as a new >> benchmark >> >and compare the difference across many runs in a controlled benchmarking >> >environment. Would that help? >> > >> >Ahmet >> > >> >[1] http://metrics.beam.apache.org/d/1/getting-started?orgId=1 >> > >> > >> >On Tue, Dec 15, 2020 at 5:48 AM Teodor Spæren <[email protected] >> > >> >wrote: >> > >> >> Hey! >> >> >> >> Yeah, that paper was what prompted my master thesis! I definitivly will >> >> post here, once I get more data :) >> >> >> >> Teodor >> >> >> >> On Mon, Dec 14, 2020 at 06:56:30AM -0600, Rion Williams wrote: >> >> >Hi Teodor, >> >> > >> >> >Although I’m sure you’ve come across it, this might have some valuable >> >> resources or methodologies to consider as you explore this a bit more: >> >> > >> >> >https://arxiv.org/pdf/1907.08302.pdf >> >> > >> >> >I’m looking forward to reading about your finding, especially using a >> >> more recent iteration of Beam! >> >> > >> >> >Rion >> >> > >> >> >> On Dec 14, 2020, at 6:37 AM, Teodor Spæren < >> [email protected]> >> >> wrote: >> >> >> >> >> >> Just bumping this so people see it now that 2.26.0 is out :) >> >> >> >> >> >>> On Wed, Nov 25, 2020 at 11:09:52AM +0100, Teodor Spæren wrote: >> >> >>> Hey! >> >> >>> >> >> >>> My name is Teodor Spæren and I'm writing a master thesis >> investigating >> >> the performance overhead of using Beam instead of using the underlying >> >> systems directly. My focus has been on Flink and I've made a discovery >> >> about some unnecessary copying between operators in the Flink >> runner[1][2]. >> >> I wrote a fixed for this and it got accepted and merged, >> >> >>> and will be in the upcoming 2.26.0 release[3]. >> >> >>> >> >> >>> I'm writing this email to ask if anyone on these mailing lists >> would >> >> be willing to send me some result of applying this option when the new >> >> version of beam releases. Anything will be very much appreciated, >> stories, >> >> screenshots of performance monitoring before and after, hard numbers, >> >> anything! If you include the cluster size and the workload that would >> be >> >> awesome too! My master thesis is set to be complete the coming summer, >> so >> >> there is no real hurry :) >> >> >>> >> >> >>> The thesis will be freely accessible[4] and I hope that these >> findings >> >> will be of help to the beam community. If anyone wishes to submit >> stories, >> >> but remain anonymous that is also ok :) >> >> >>> >> >> >>> The best way to contact me would be to send an email my way here, >> or >> >> on [email protected]. >> >> >>> >> >> >>> Any help is appreciated, thanks for your attention! >> >> >>> >> >> >>> Best regards, >> >> >>> Teodor Spæren >> >> >>> >> >> >>> >> >> >>> [1]: >> >> >> https://lists.apache.org/thread.html/r24129dba98782e1cf4d18ec738ab9714dceb05ac23f13adfac5baad1%40%3Cdev.beam.apache.org%3E >> >> >>> [2]: https://issues.apache.org/jira/browse/BEAM-11146 >> >> >>> [3]: https://github.com/apache/beam/pull/13240 >> >> >>> [4]: https://www.duo.uio.no/ >> >> >> >
