It sounds like a good idea to me.
Regards
JB
On 10/18/2016 08:08 PM, Amit Sela wrote:
@Jesse how about runners "tracing" the constructed DAG (by Beam) so that
it's clear what the runner actually executed ?
Example:
For the SparkRunner, a ParDo translates to a mapPartitions transformation.
That could provide transparency when debugging/benchmarking pipelines
per-runner.
On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson <[email protected]>
wrote:
@Dan before starting with Beam, I'd want to know how much performance I've
giving up by not programming directly to the API.
On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin <[email protected]
wrote:
I think there are lots of excellent one-off performance studies, but I'm
not sure how useful that is to Beam.
From a test infra point of view, I'm wondering more about tracking of
performance over time, identifying regressions, etc.
Google has some tools like PerfKit
<https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
basically a skin on a database + some scripts to load and query data;
but I
don't love it. Do other Apache projects do public, long-term benchmarking
and performance regression testing?
Dan
On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson <[email protected]>
wrote:
I found data Artisan's benchmarking post
<http://data-artisans.com/high-throughput-low-latency-and-
exactly-once-stream-processing-with-apache-flink/>.
They also shared the code <https://github.com/dataArtisans/performance
.
I
didn't dig in much, but they did a wide range of algorithms. They have
the
native code, so you write the Beam code and check against the native
performance.
On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
<[email protected]>
wrote:
Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)
under
Beam.I can share my experience. Can you list items of interest to
know
so I
can answer them to the best of my knowledge.Cheers
From: Jason Kuster <[email protected]>
To: [email protected]
Sent: Monday, October 17, 2016 5:06 PM
Subject: Exploring Performance Testing
Hey all,
Now that we've covered some of the initial ground with regard to
correctness testing, I'm going to be starting work on performance
testing
and benchmarking. I wanted to reach out and see what people's
experiences
have been with performance testing and benchmarking
frameworks, particularly in other Apache projects. Anyone have any
experience or thoughts?
Best,
Jason
--
-------
Jason Kuster
Apache Beam (Incubating) / Google Cloud Dataflow
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com