Do you want a benchmark to identify regressions or one to spur competition
between the different implementations and let them battle it out for supremacy?
You are going to get people using it for both no matter what you do so you
should plan on supporting both.
The are several problems that need to be tackled for a good benchmark. The
primary one that I have run into is that performance in distributed systems is
tied very closely to the hardware being used (CPU/Memory/Network), and just as
importantly the hardwares failure rate (especially soft failure that are
automatically recovered from). Even with dedicated hardware to run the tests
on the failure rate of the hardware increases over time so an old run compared
to a recent run is not an apples to apples comparison. My suggestion would be
to focus on an automated way to build and deploy different versions to the same
hardware and not worry about which DB to store the results in, because the
results are not going to be truly comparable over time.
The second problem is which benchmarks to write. I think you need a
combination of micro-benchmarks and real world benchmarks. Micro-benchmarks
should show most performance regressions/gains. Real world benchmarks are
likely going to show how these changes impact actual users (most of whom
probably are not trying to push 100 GB/sec through). One thing we learned from
the Yahoo streaming benchmark (full disclosure my team and I wrote it) and
subsequent conversations with people who worked on the Flink and Apex updates
to it is that a well written streaming system will likely have external tools
specifically state stores be the bottleneck.
The third problem is what to report as the results of the benchmark. If you
get adoption for this benchmark people will optimize for what you report. Pick
it wisely. In the Yahoo streaming benchmark we concentrated on latency vs
throughput and specifically at the very low end of latency. People on the
Spark project were not happy with this because Spark is not designed for
sub-second latency, so it really was not a fair comparison for use cases that
don't need sub-second latency. We completely neglected resource utilization
and cost. Really people want to know a few things when deciding about upgrading
to a new version or switch to a different underlying engine. The priority of
these things may change based on different use cases.
1) how will the cost for me change. Raw $ for the cloud, and how much more can
I cram onto my boxes for dedicated hardware.2) how will the performance change
for my use case. (latency/throughput)3) unrelated to the benchmark what are
the different features that will make my life simpler.
The recent impala benchmark comparing it to redshift
https://blog.cloudera.com/blog/2016/09/apache-impala-incubating-vs-amazon-redshift-s3-integration-elasticity-agility-and-cost-performance-benefits-on-aws/
I think did a decent job of this, answering 1 and 2 for some very specific
setups.
- Bobby
On Tuesday, October 18, 2016 3:36 PM, Lukasz Cwik
<[email protected]> wrote:
FYI, there was a PR which was outstanding which was about adding the
Nexmark suite: https://github.com/apache/incubator-beam/pull/366
On Tue, Oct 18, 2016 at 1:12 PM, Ismaël Mejía <[email protected]> wrote:
> @Jason, Just some additional refs for ideas, since I already researched a
> little
> bit about how people evaluated this in other Apache projects.
>
> Yahoo published one benchmarking analysis in different streaming frameworks
> like
> a year ago:
> https://github.com/yahoo/streaming-benchmarks
>
> And the flink guys extended it:
> https://github.com/dataArtisans/yahoo-streaming-benchmark
>
> Notice that the common approach comes from the classical database world,
> and it
> is to take one of the TPC queries suites (TPC-H or TPC-DS) and evaluate a
> data
> processing framework against it, Spark does this to evaluate their SQL
> performance.
>
> https://github.com/databricks/spark-sql-perf
>
> However this approach is not 100% aligned with Beam because AFAIK there is
> not a
> TPC suite for continuous processing, that's the reason why I found the
> NexMark
> suite as a more appropriate example.
>
>
> On Tue, Oct 18, 2016 at 9:50 PM, Ismaël Mejía <[email protected]> wrote:
>
> > Hello,
> >
> > Now that we are discussing about the subject of performance testing, I
> > want to
> > jump into the conversation to remind everybody that we have a really
> > interesting
> > benchmarking suite already contributed by google that has (sadly) not
> been
> > merged yet.
> >
> > https://github.com/apache/incubator-beam/pull/366
> > https://issues.apache.org/jira/browse/BEAM-160
> >
> > This is not exactly the kind of benchmark of the current discussion, but
> > for me
> > is a super valuable contribution that I hope we can use/refine to
> evaluate
> > the
> > runners.
> >
> > Ismaël Mejía
> >
> >
> > On Tue, Oct 18, 2016 at 8:16 PM, Jean-Baptiste Onofré <[email protected]>
> > wrote:
> >
> >> It sounds like a good idea to me.
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 10/18/2016 08:08 PM, Amit Sela wrote:
> >>
> >>> @Jesse how about runners "tracing" the constructed DAG (by Beam) so
> that
> >>> it's clear what the runner actually executed ?
> >>>
> >>> Example:
> >>> For the SparkRunner, a ParDo translates to a mapPartitions
> >>> transformation.
> >>>
> >>> That could provide transparency when debugging/benchmarking pipelines
> >>> per-runner.
> >>>
> >>> On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson <[email protected]>
> >>> wrote:
> >>>
> >>> @Dan before starting with Beam, I'd want to know how much performance
> >>>> I've
> >>>> giving up by not programming directly to the API.
> >>>>
> >>>> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin
> >>>> <[email protected]
> >>>>
> >>>>>
> >>>>> wrote:
> >>>>
> >>>> I think there are lots of excellent one-off performance studies, but
> I'm
> >>>>> not sure how useful that is to Beam.
> >>>>>
> >>>>> From a test infra point of view, I'm wondering more about tracking of
> >>>>> performance over time, identifying regressions, etc.
> >>>>>
> >>>>> Google has some tools like PerfKit
> >>>>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
> >>>>> basically a skin on a database + some scripts to load and query data;
> >>>>>
> >>>> but I
> >>>>
> >>>>> don't love it. Do other Apache projects do public, long-term
> >>>>> benchmarking
> >>>>> and performance regression testing?
> >>>>>
> >>>>> Dan
> >>>>>
> >>>>> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson <
> [email protected]
> >>>>> >
> >>>>> wrote:
> >>>>>
> >>>>> I found data Artisan's benchmarking post
> >>>>>> <http://data-artisans.com/high-throughput-low-latency-and-
> >>>>>> exactly-once-stream-processing-with-apache-flink/>.
> >>>>>> They also shared the code <https://github.com/dataArtisa
> >>>>>> ns/performance
> >>>>>>
> >>>>> .
> >>>>> I
> >>>>>
> >>>>>> didn't dig in much, but they did a wide range of algorithms. They
> have
> >>>>>>
> >>>>> the
> >>>>>
> >>>>>> native code, so you write the Beam code and check against the native
> >>>>>> performance.
> >>>>>>
> >>>>>> On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
> >>>>>> <[email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)
> >>>>>>>
> >>>>>> under
> >>>>>
> >>>>>> Beam.I can share my experience. Can you list items of interest to
> >>>>>>>
> >>>>>> know
> >>>>
> >>>>> so I
> >>>>>>
> >>>>>>> can answer them to the best of my knowledge.Cheers
> >>>>>>>
> >>>>>>> From: Jason Kuster <[email protected]>
> >>>>>>> To: [email protected]
> >>>>>>> Sent: Monday, October 17, 2016 5:06 PM
> >>>>>>> Subject: Exploring Performance Testing
> >>>>>>>
> >>>>>>> Hey all,
> >>>>>>>
> >>>>>>> Now that we've covered some of the initial ground with regard to
> >>>>>>> correctness testing, I'm going to be starting work on performance
> >>>>>>>
> >>>>>> testing
> >>>>>
> >>>>>> and benchmarking. I wanted to reach out and see what people's
> >>>>>>>
> >>>>>> experiences
> >>>>>
> >>>>>> have been with performance testing and benchmarking
> >>>>>>> frameworks, particularly in other Apache projects. Anyone have any
> >>>>>>> experience or thoughts?
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> Jason
> >>>>>>>
> >>>>>>> --
> >>>>>>> -------
> >>>>>>> Jason Kuster
> >>>>>>> Apache Beam (Incubating) / Google Cloud Dataflow
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >> --
> >> Jean-Baptiste Onofré
> >> [email protected]
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >
> >
>