Do you want a benchmark to identify regressions or one to spur competition 
between the different implementations and let them battle it out for supremacy? 
You are going to get people using it for both no matter what you do so you 
should plan on supporting both.
The are several problems that need to be tackled for a good benchmark.  The 
primary one that I have run into is that performance in distributed systems is 
tied very closely to the hardware being used (CPU/Memory/Network), and just as 
importantly the hardwares failure rate (especially soft failure that are 
automatically recovered from).  Even with dedicated hardware to run the tests 
on the failure rate of the hardware increases over time so an old run compared 
to a recent run is not an apples to apples comparison.  My suggestion would be 
to focus on an automated way to build and deploy different versions to the same 
hardware and not worry about which DB to store the results in, because the 
results are not going to be truly comparable over time.
The second problem is which benchmarks to write.  I think you need a 
combination of micro-benchmarks and real world benchmarks.   Micro-benchmarks 
should show most performance regressions/gains.  Real world benchmarks are 
likely going to show how these changes impact actual users (most of whom 
probably are not trying to push 100 GB/sec through).  One thing we learned from 
the Yahoo streaming benchmark (full disclosure my team and I wrote it) and 
subsequent conversations with people who worked on the Flink and Apex updates 
to it is that a well written streaming system will likely have external tools 
specifically state stores be the bottleneck.
The third problem is what to report as the results of the benchmark.  If you 
get adoption for this benchmark people will optimize for what you report.  Pick 
it wisely.  In the Yahoo streaming benchmark we concentrated on latency vs 
throughput and specifically at the very low end of latency.  People on the 
Spark project were not happy with this because Spark is not designed for 
sub-second latency, so it really was not a fair comparison for use cases that 
don't need sub-second latency.  We completely neglected resource utilization 
and cost. Really people want to know a few things when deciding about upgrading 
to a new version or switch to a different underlying engine. The priority of 
these things may change based on different use cases.
1) how will the cost for me change.  Raw $ for the cloud, and how much more can 
I cram onto my boxes for dedicated hardware.2) how will the performance change 
for my use case.  (latency/throughput)3) unrelated to the benchmark what are 
the different features that will make my life simpler.
The recent impala benchmark comparing it to redshift 
https://blog.cloudera.com/blog/2016/09/apache-impala-incubating-vs-amazon-redshift-s3-integration-elasticity-agility-and-cost-performance-benefits-on-aws/
 I think did a decent job of this, answering 1 and 2 for some very specific 
setups.

 - Bobby 

    On Tuesday, October 18, 2016 3:36 PM, Lukasz Cwik 
<[email protected]> wrote:
 

 FYI, there was a PR which was outstanding which was about adding the
Nexmark suite: https://github.com/apache/incubator-beam/pull/366

On Tue, Oct 18, 2016 at 1:12 PM, Ismaël Mejía <[email protected]> wrote:

> @Jason, Just some additional refs for ideas, since I already researched a
> little
> bit about how people evaluated this in other Apache projects.
>
> Yahoo published one benchmarking analysis in different streaming frameworks
> like
> a year ago:
> https://github.com/yahoo/streaming-benchmarks
>
> And the flink guys extended it:
> https://github.com/dataArtisans/yahoo-streaming-benchmark
>
> Notice that the common approach comes from the classical database world,
> and it
> is to take one of the TPC queries suites (TPC-H or TPC-DS) and evaluate a
> data
> processing framework against it, Spark does this to evaluate their SQL
> performance.
>
> https://github.com/databricks/spark-sql-perf
>
> However this approach is not 100% aligned with Beam because AFAIK there is
> not a
> TPC suite for continuous processing, that's the reason why I found the
> NexMark
> suite as a more appropriate example.
>
>
> On Tue, Oct 18, 2016 at 9:50 PM, Ismaël Mejía <[email protected]> wrote:
>
> > Hello,
> >
> > Now that we are discussing about the subject of performance testing, I
> > want to
> > jump into the conversation to remind everybody that we have a really
> > interesting
> > benchmarking suite already contributed by google that has (sadly) not
> been
> > merged yet.
> >
> > https://github.com/apache/incubator-beam/pull/366
> > https://issues.apache.org/jira/browse/BEAM-160
> >
> > This is not exactly the kind of benchmark of the current discussion, but
> > for me
> > is a super valuable contribution that I hope we can use/refine to
> evaluate
> > the
> > runners.
> >
> > Ismaël Mejía
> >
> >
> > On Tue, Oct 18, 2016 at 8:16 PM, Jean-Baptiste Onofré <[email protected]>
> > wrote:
> >
> >> It sounds like a good idea to me.
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 10/18/2016 08:08 PM, Amit Sela wrote:
> >>
> >>> @Jesse how about runners "tracing" the constructed DAG (by Beam) so
> that
> >>> it's clear what the runner actually executed ?
> >>>
> >>> Example:
> >>> For the SparkRunner, a ParDo translates to a mapPartitions
> >>> transformation.
> >>>
> >>> That could provide transparency when debugging/benchmarking pipelines
> >>> per-runner.
> >>>
> >>> On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson <[email protected]>
> >>> wrote:
> >>>
> >>> @Dan before starting with Beam, I'd want to know how much performance
> >>>> I've
> >>>> giving up by not programming directly to the API.
> >>>>
> >>>> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin
> >>>> <[email protected]
> >>>>
> >>>>>
> >>>>> wrote:
> >>>>
> >>>> I think there are lots of excellent one-off performance studies, but
> I'm
> >>>>> not sure how useful that is to Beam.
> >>>>>
> >>>>> From a test infra point of view, I'm wondering more about tracking of
> >>>>> performance over time, identifying regressions, etc.
> >>>>>
> >>>>> Google has some tools like PerfKit
> >>>>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
> >>>>> basically a skin on a database + some scripts to load and query data;
> >>>>>
> >>>> but I
> >>>>
> >>>>> don't love it. Do other Apache projects do public, long-term
> >>>>> benchmarking
> >>>>> and performance regression testing?
> >>>>>
> >>>>> Dan
> >>>>>
> >>>>> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson <
> [email protected]
> >>>>> >
> >>>>> wrote:
> >>>>>
> >>>>> I found data Artisan's benchmarking post
> >>>>>> <http://data-artisans.com/high-throughput-low-latency-and-
> >>>>>> exactly-once-stream-processing-with-apache-flink/>.
> >>>>>> They also shared the code <https://github.com/dataArtisa
> >>>>>> ns/performance
> >>>>>>
> >>>>> .
> >>>>> I
> >>>>>
> >>>>>> didn't dig in much, but they did a wide range of algorithms. They
> have
> >>>>>>
> >>>>> the
> >>>>>
> >>>>>> native code, so you write the Beam code and check against the native
> >>>>>> performance.
> >>>>>>
> >>>>>> On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
> >>>>>> <[email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)
> >>>>>>>
> >>>>>> under
> >>>>>
> >>>>>> Beam.I can share my experience. Can you list items of interest to
> >>>>>>>
> >>>>>> know
> >>>>
> >>>>> so I
> >>>>>>
> >>>>>>> can answer them to the best of my knowledge.Cheers
> >>>>>>>
> >>>>>>>      From: Jason Kuster <[email protected]>
> >>>>>>>  To: [email protected]
> >>>>>>>  Sent: Monday, October 17, 2016 5:06 PM
> >>>>>>>  Subject: Exploring Performance Testing
> >>>>>>>
> >>>>>>> Hey all,
> >>>>>>>
> >>>>>>> Now that we've covered some of the initial ground with regard to
> >>>>>>> correctness testing, I'm going to be starting work on performance
> >>>>>>>
> >>>>>> testing
> >>>>>
> >>>>>> and benchmarking. I wanted to reach out and see what people's
> >>>>>>>
> >>>>>> experiences
> >>>>>
> >>>>>> have been with performance testing and benchmarking
> >>>>>>> frameworks, particularly in other Apache projects. Anyone have any
> >>>>>>> experience or thoughts?
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> Jason
> >>>>>>>
> >>>>>>> --
> >>>>>>> -------
> >>>>>>> Jason Kuster
> >>>>>>> Apache Beam (Incubating) / Google Cloud Dataflow
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >> --
> >> Jean-Baptiste Onofré
> >> [email protected]
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >
> >
>

   

Reply via email to