Hello, Now that we are discussing about the subject of performance testing, I want to jump into the conversation to remind everybody that we have a really interesting benchmarking suite already contributed by google that has (sadly) not been merged yet.
https://github.com/apache/incubator-beam/pull/366 https://issues.apache.org/jira/browse/BEAM-160 This is not exactly the kind of benchmark of the current discussion, but for me is a super valuable contribution that I hope we can use/refine to evaluate the runners. Ismaël Mejía On Tue, Oct 18, 2016 at 8:16 PM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > It sounds like a good idea to me. > > Regards > JB > > > On 10/18/2016 08:08 PM, Amit Sela wrote: > >> @Jesse how about runners "tracing" the constructed DAG (by Beam) so that >> it's clear what the runner actually executed ? >> >> Example: >> For the SparkRunner, a ParDo translates to a mapPartitions transformation. >> >> That could provide transparency when debugging/benchmarking pipelines >> per-runner. >> >> On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson <je...@smokinghand.com> >> wrote: >> >> @Dan before starting with Beam, I'd want to know how much performance I've >>> giving up by not programming directly to the API. >>> >>> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin >>> <dhalp...@google.com.invalid >>> >>>> >>>> wrote: >>> >>> I think there are lots of excellent one-off performance studies, but I'm >>>> not sure how useful that is to Beam. >>>> >>>> From a test infra point of view, I'm wondering more about tracking of >>>> performance over time, identifying regressions, etc. >>>> >>>> Google has some tools like PerfKit >>>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is >>>> basically a skin on a database + some scripts to load and query data; >>>> >>> but I >>> >>>> don't love it. Do other Apache projects do public, long-term >>>> benchmarking >>>> and performance regression testing? >>>> >>>> Dan >>>> >>>> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson <je...@smokinghand.com> >>>> wrote: >>>> >>>> I found data Artisan's benchmarking post >>>>> <http://data-artisans.com/high-throughput-low-latency-and- >>>>> exactly-once-stream-processing-with-apache-flink/>. >>>>> They also shared the code <https://github.com/dataArtisans/performance >>>>> >>>> . >>>> I >>>> >>>>> didn't dig in much, but they did a wide range of algorithms. They have >>>>> >>>> the >>>> >>>>> native code, so you write the Beam code and check against the native >>>>> performance. >>>>> >>>>> On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari >>>>> <amirto...@yahoo.com.invalid> >>>>> wrote: >>>>> >>>>> Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) >>>>>> >>>>> under >>>> >>>>> Beam.I can share my experience. Can you list items of interest to >>>>>> >>>>> know >>> >>>> so I >>>>> >>>>>> can answer them to the best of my knowledge.Cheers >>>>>> >>>>>> From: Jason Kuster <jasonkus...@google.com.INVALID> >>>>>> To: dev@beam.incubator.apache.org >>>>>> Sent: Monday, October 17, 2016 5:06 PM >>>>>> Subject: Exploring Performance Testing >>>>>> >>>>>> Hey all, >>>>>> >>>>>> Now that we've covered some of the initial ground with regard to >>>>>> correctness testing, I'm going to be starting work on performance >>>>>> >>>>> testing >>>> >>>>> and benchmarking. I wanted to reach out and see what people's >>>>>> >>>>> experiences >>>> >>>>> have been with performance testing and benchmarking >>>>>> frameworks, particularly in other Apache projects. Anyone have any >>>>>> experience or thoughts? >>>>>> >>>>>> Best, >>>>>> >>>>>> Jason >>>>>> >>>>>> -- >>>>>> ------- >>>>>> Jason Kuster >>>>>> Apache Beam (Incubating) / Google Cloud Dataflow >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >