FYI, there was a PR which was outstanding which was about adding the
Nexmark suite: https://github.com/apache/incubator-beam/pull/366

On Tue, Oct 18, 2016 at 1:12 PM, Ismaël Mejía <ieme...@gmail.com> wrote:

> @Jason, Just some additional refs for ideas, since I already researched a
> little
> bit about how people evaluated this in other Apache projects.
>
> Yahoo published one benchmarking analysis in different streaming frameworks
> like
> a year ago:
> https://github.com/yahoo/streaming-benchmarks
>
> And the flink guys extended it:
> https://github.com/dataArtisans/yahoo-streaming-benchmark
>
> Notice that the common approach comes from the classical database world,
> and it
> is to take one of the TPC queries suites (TPC-H or TPC-DS) and evaluate a
> data
> processing framework against it, Spark does this to evaluate their SQL
> performance.
>
> https://github.com/databricks/spark-sql-perf
>
> However this approach is not 100% aligned with Beam because AFAIK there is
> not a
> TPC suite for continuous processing, that's the reason why I found the
> NexMark
> suite as a more appropriate example.
>
>
> On Tue, Oct 18, 2016 at 9:50 PM, Ismaël Mejía <ieme...@gmail.com> wrote:
>
> > Hello,
> >
> > Now that we are discussing about the subject of performance testing, I
> > want to
> > jump into the conversation to remind everybody that we have a really
> > interesting
> > benchmarking suite already contributed by google that has (sadly) not
> been
> > merged yet.
> >
> > https://github.com/apache/incubator-beam/pull/366
> > https://issues.apache.org/jira/browse/BEAM-160
> >
> > This is not exactly the kind of benchmark of the current discussion, but
> > for me
> > is a super valuable contribution that I hope we can use/refine to
> evaluate
> > the
> > runners.
> >
> > Ismaël Mejía
> >
> >
> > On Tue, Oct 18, 2016 at 8:16 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> > wrote:
> >
> >> It sounds like a good idea to me.
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 10/18/2016 08:08 PM, Amit Sela wrote:
> >>
> >>> @Jesse how about runners "tracing" the constructed DAG (by Beam) so
> that
> >>> it's clear what the runner actually executed ?
> >>>
> >>> Example:
> >>> For the SparkRunner, a ParDo translates to a mapPartitions
> >>> transformation.
> >>>
> >>> That could provide transparency when debugging/benchmarking pipelines
> >>> per-runner.
> >>>
> >>> On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson <je...@smokinghand.com>
> >>> wrote:
> >>>
> >>> @Dan before starting with Beam, I'd want to know how much performance
> >>>> I've
> >>>> giving up by not programming directly to the API.
> >>>>
> >>>> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin
> >>>> <dhalp...@google.com.invalid
> >>>>
> >>>>>
> >>>>> wrote:
> >>>>
> >>>> I think there are lots of excellent one-off performance studies, but
> I'm
> >>>>> not sure how useful that is to Beam.
> >>>>>
> >>>>> From a test infra point of view, I'm wondering more about tracking of
> >>>>> performance over time, identifying regressions, etc.
> >>>>>
> >>>>> Google has some tools like PerfKit
> >>>>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
> >>>>> basically a skin on a database + some scripts to load and query data;
> >>>>>
> >>>> but I
> >>>>
> >>>>> don't love it. Do other Apache projects do public, long-term
> >>>>> benchmarking
> >>>>> and performance regression testing?
> >>>>>
> >>>>> Dan
> >>>>>
> >>>>> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson <
> je...@smokinghand.com
> >>>>> >
> >>>>> wrote:
> >>>>>
> >>>>> I found data Artisan's benchmarking post
> >>>>>> <http://data-artisans.com/high-throughput-low-latency-and-
> >>>>>> exactly-once-stream-processing-with-apache-flink/>.
> >>>>>> They also shared the code <https://github.com/dataArtisa
> >>>>>> ns/performance
> >>>>>>
> >>>>> .
> >>>>> I
> >>>>>
> >>>>>> didn't dig in much, but they did a wide range of algorithms. They
> have
> >>>>>>
> >>>>> the
> >>>>>
> >>>>>> native code, so you write the Beam code and check against the native
> >>>>>> performance.
> >>>>>>
> >>>>>> On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
> >>>>>> <amirto...@yahoo.com.invalid>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)
> >>>>>>>
> >>>>>> under
> >>>>>
> >>>>>> Beam.I can share my experience. Can you list items of interest to
> >>>>>>>
> >>>>>> know
> >>>>
> >>>>> so I
> >>>>>>
> >>>>>>> can answer them to the best of my knowledge.Cheers
> >>>>>>>
> >>>>>>>       From: Jason Kuster <jasonkus...@google.com.INVALID>
> >>>>>>>  To: dev@beam.incubator.apache.org
> >>>>>>>  Sent: Monday, October 17, 2016 5:06 PM
> >>>>>>>  Subject: Exploring Performance Testing
> >>>>>>>
> >>>>>>> Hey all,
> >>>>>>>
> >>>>>>> Now that we've covered some of the initial ground with regard to
> >>>>>>> correctness testing, I'm going to be starting work on performance
> >>>>>>>
> >>>>>> testing
> >>>>>
> >>>>>> and benchmarking. I wanted to reach out and see what people's
> >>>>>>>
> >>>>>> experiences
> >>>>>
> >>>>>> have been with performance testing and benchmarking
> >>>>>>> frameworks, particularly in other Apache projects. Anyone have any
> >>>>>>> experience or thoughts?
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> Jason
> >>>>>>>
> >>>>>>> --
> >>>>>>> -------
> >>>>>>> Jason Kuster
> >>>>>>> Apache Beam (Incubating) / Google Cloud Dataflow
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >> --
> >> Jean-Baptiste Onofré
> >> jbono...@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >
> >
>

Reply via email to