I think there are lots of excellent one-off performance studies, but I'm not sure how useful that is to Beam.
>From a test infra point of view, I'm wondering more about tracking of performance over time, identifying regressions, etc. Google has some tools like PerfKit <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is basically a skin on a database + some scripts to load and query data; but I don't love it. Do other Apache projects do public, long-term benchmarking and performance regression testing? Dan On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson <[email protected]> wrote: > I found data Artisan's benchmarking post > <http://data-artisans.com/high-throughput-low-latency-and- > exactly-once-stream-processing-with-apache-flink/>. > They also shared the code <https://github.com/dataArtisans/performance>. I > didn't dig in much, but they did a wide range of algorithms. They have the > native code, so you write the Beam code and check against the native > performance. > > On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari > <[email protected]> > wrote: > > > Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) under > > Beam.I can share my experience. Can you list items of interest to know > so I > > can answer them to the best of my knowledge.Cheers > > > > From: Jason Kuster <[email protected]> > > To: [email protected] > > Sent: Monday, October 17, 2016 5:06 PM > > Subject: Exploring Performance Testing > > > > Hey all, > > > > Now that we've covered some of the initial ground with regard to > > correctness testing, I'm going to be starting work on performance testing > > and benchmarking. I wanted to reach out and see what people's experiences > > have been with performance testing and benchmarking > > frameworks, particularly in other Apache projects. Anyone have any > > experience or thoughts? > > > > Best, > > > > Jason > > > > -- > > ------- > > Jason Kuster > > Apache Beam (Incubating) / Google Cloud Dataflow > > > > > > >
