It sounds like a good idea to me.
On 10/18/2016 08:08 PM, Amit Sela wrote:
@Jesse how about runners "tracing" the constructed DAG (by Beam) so that
it's clear what the runner actually executed ?
For the SparkRunner, a ParDo translates to a mapPartitions transformation.
That could provide transparency when debugging/benchmarking pipelines
On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson <je...@smokinghand.com>
@Dan before starting with Beam, I'd want to know how much performance I've
giving up by not programming directly to the API.
On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin <dhalp...@google.com.invalid
I think there are lots of excellent one-off performance studies, but I'm
not sure how useful that is to Beam.
From a test infra point of view, I'm wondering more about tracking of
performance over time, identifying regressions, etc.
Google has some tools like PerfKit
<https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
basically a skin on a database + some scripts to load and query data;
don't love it. Do other Apache projects do public, long-term benchmarking
and performance regression testing?
On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson <je...@smokinghand.com>
I found data Artisan's benchmarking post
They also shared the code <https://github.com/dataArtisans/performance
didn't dig in much, but they did a wide range of algorithms. They have
native code, so you write the Beam code and check against the native
On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)
Beam.I can share my experience. Can you list items of interest to
can answer them to the best of my knowledge.Cheers
From: Jason Kuster <jasonkus...@google.com.INVALID>
Sent: Monday, October 17, 2016 5:06 PM
Subject: Exploring Performance Testing
Now that we've covered some of the initial ground with regard to
correctness testing, I'm going to be starting work on performance
and benchmarking. I wanted to reach out and see what people's
have been with performance testing and benchmarking
frameworks, particularly in other Apache projects. Anyone have any
experience or thoughts?
Apache Beam (Incubating) / Google Cloud Dataflow
Talend - http://www.talend.com