@Jason, Just some additional refs for ideas, since I already researched a
little
bit about how people evaluated this in other Apache projects.

Yahoo published one benchmarking analysis in different streaming frameworks
like
a year ago:
https://github.com/yahoo/streaming-benchmarks

And the flink guys extended it:
https://github.com/dataArtisans/yahoo-streaming-benchmark

Notice that the common approach comes from the classical database world,
and it
is to take one of the TPC queries suites (TPC-H or TPC-DS) and evaluate a
data
processing framework against it, Spark does this to evaluate their SQL
performance.

https://github.com/databricks/spark-sql-perf

However this approach is not 100% aligned with Beam because AFAIK there is
not a
TPC suite for continuous processing, that's the reason why I found the
NexMark
suite as a more appropriate example.


On Tue, Oct 18, 2016 at 9:50 PM, Ismaël Mejía <ieme...@gmail.com> wrote:

> Hello,
>
> Now that we are discussing about the subject of performance testing, I
> want to
> jump into the conversation to remind everybody that we have a really
> interesting
> benchmarking suite already contributed by google that has (sadly) not been
> merged yet.
>
> https://github.com/apache/incubator-beam/pull/366
> https://issues.apache.org/jira/browse/BEAM-160
>
> This is not exactly the kind of benchmark of the current discussion, but
> for me
> is a super valuable contribution that I hope we can use/refine to evaluate
> the
> runners.
>
> Ismaël Mejía
>
>
> On Tue, Oct 18, 2016 at 8:16 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> It sounds like a good idea to me.
>>
>> Regards
>> JB
>>
>>
>> On 10/18/2016 08:08 PM, Amit Sela wrote:
>>
>>> @Jesse how about runners "tracing" the constructed DAG (by Beam) so that
>>> it's clear what the runner actually executed ?
>>>
>>> Example:
>>> For the SparkRunner, a ParDo translates to a mapPartitions
>>> transformation.
>>>
>>> That could provide transparency when debugging/benchmarking pipelines
>>> per-runner.
>>>
>>> On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson <je...@smokinghand.com>
>>> wrote:
>>>
>>> @Dan before starting with Beam, I'd want to know how much performance
>>>> I've
>>>> giving up by not programming directly to the API.
>>>>
>>>> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin
>>>> <dhalp...@google.com.invalid
>>>>
>>>>>
>>>>> wrote:
>>>>
>>>> I think there are lots of excellent one-off performance studies, but I'm
>>>>> not sure how useful that is to Beam.
>>>>>
>>>>> From a test infra point of view, I'm wondering more about tracking of
>>>>> performance over time, identifying regressions, etc.
>>>>>
>>>>> Google has some tools like PerfKit
>>>>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
>>>>> basically a skin on a database + some scripts to load and query data;
>>>>>
>>>> but I
>>>>
>>>>> don't love it. Do other Apache projects do public, long-term
>>>>> benchmarking
>>>>> and performance regression testing?
>>>>>
>>>>> Dan
>>>>>
>>>>> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson <je...@smokinghand.com
>>>>> >
>>>>> wrote:
>>>>>
>>>>> I found data Artisan's benchmarking post
>>>>>> <http://data-artisans.com/high-throughput-low-latency-and-
>>>>>> exactly-once-stream-processing-with-apache-flink/>.
>>>>>> They also shared the code <https://github.com/dataArtisa
>>>>>> ns/performance
>>>>>>
>>>>> .
>>>>> I
>>>>>
>>>>>> didn't dig in much, but they did a wide range of algorithms. They have
>>>>>>
>>>>> the
>>>>>
>>>>>> native code, so you write the Beam code and check against the native
>>>>>> performance.
>>>>>>
>>>>>> On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
>>>>>> <amirto...@yahoo.com.invalid>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)
>>>>>>>
>>>>>> under
>>>>>
>>>>>> Beam.I can share my experience. Can you list items of interest to
>>>>>>>
>>>>>> know
>>>>
>>>>> so I
>>>>>>
>>>>>>> can answer them to the best of my knowledge.Cheers
>>>>>>>
>>>>>>>       From: Jason Kuster <jasonkus...@google.com.INVALID>
>>>>>>>  To: dev@beam.incubator.apache.org
>>>>>>>  Sent: Monday, October 17, 2016 5:06 PM
>>>>>>>  Subject: Exploring Performance Testing
>>>>>>>
>>>>>>> Hey all,
>>>>>>>
>>>>>>> Now that we've covered some of the initial ground with regard to
>>>>>>> correctness testing, I'm going to be starting work on performance
>>>>>>>
>>>>>> testing
>>>>>
>>>>>> and benchmarking. I wanted to reach out and see what people's
>>>>>>>
>>>>>> experiences
>>>>>
>>>>>> have been with performance testing and benchmarking
>>>>>>> frameworks, particularly in other Apache projects. Anyone have any
>>>>>>> experience or thoughts?
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Jason
>>>>>>>
>>>>>>> --
>>>>>>> -------
>>>>>>> Jason Kuster
>>>>>>> Apache Beam (Incubating) / Google Cloud Dataflow
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>

Reply via email to