Hello,

Now that we are discussing about the subject of performance testing, I want
to
jump into the conversation to remind everybody that we have a really
interesting
benchmarking suite already contributed by google that has (sadly) not been
merged yet.

https://github.com/apache/incubator-beam/pull/366
https://issues.apache.org/jira/browse/BEAM-160

This is not exactly the kind of benchmark of the current discussion, but
for me
is a super valuable contribution that I hope we can use/refine to evaluate
the
runners.

Ismaël Mejía


On Tue, Oct 18, 2016 at 8:16 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> It sounds like a good idea to me.
>
> Regards
> JB
>
>
> On 10/18/2016 08:08 PM, Amit Sela wrote:
>
>> @Jesse how about runners "tracing" the constructed DAG (by Beam) so that
>> it's clear what the runner actually executed ?
>>
>> Example:
>> For the SparkRunner, a ParDo translates to a mapPartitions transformation.
>>
>> That could provide transparency when debugging/benchmarking pipelines
>> per-runner.
>>
>> On Tue, Oct 18, 2016 at 8:25 PM Jesse Anderson <je...@smokinghand.com>
>> wrote:
>>
>> @Dan before starting with Beam, I'd want to know how much performance I've
>>> giving up by not programming directly to the API.
>>>
>>> On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin
>>> <dhalp...@google.com.invalid
>>>
>>>>
>>>> wrote:
>>>
>>> I think there are lots of excellent one-off performance studies, but I'm
>>>> not sure how useful that is to Beam.
>>>>
>>>> From a test infra point of view, I'm wondering more about tracking of
>>>> performance over time, identifying regressions, etc.
>>>>
>>>> Google has some tools like PerfKit
>>>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
>>>> basically a skin on a database + some scripts to load and query data;
>>>>
>>> but I
>>>
>>>> don't love it. Do other Apache projects do public, long-term
>>>> benchmarking
>>>> and performance regression testing?
>>>>
>>>> Dan
>>>>
>>>> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson <je...@smokinghand.com>
>>>> wrote:
>>>>
>>>> I found data Artisan's benchmarking post
>>>>> <http://data-artisans.com/high-throughput-low-latency-and-
>>>>> exactly-once-stream-processing-with-apache-flink/>.
>>>>> They also shared the code <https://github.com/dataArtisans/performance
>>>>>
>>>> .
>>>> I
>>>>
>>>>> didn't dig in much, but they did a wide range of algorithms. They have
>>>>>
>>>> the
>>>>
>>>>> native code, so you write the Beam code and check against the native
>>>>> performance.
>>>>>
>>>>> On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
>>>>> <amirto...@yahoo.com.invalid>
>>>>> wrote:
>>>>>
>>>>> Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)
>>>>>>
>>>>> under
>>>>
>>>>> Beam.I can share my experience. Can you list items of interest to
>>>>>>
>>>>> know
>>>
>>>> so I
>>>>>
>>>>>> can answer them to the best of my knowledge.Cheers
>>>>>>
>>>>>>       From: Jason Kuster <jasonkus...@google.com.INVALID>
>>>>>>  To: dev@beam.incubator.apache.org
>>>>>>  Sent: Monday, October 17, 2016 5:06 PM
>>>>>>  Subject: Exploring Performance Testing
>>>>>>
>>>>>> Hey all,
>>>>>>
>>>>>> Now that we've covered some of the initial ground with regard to
>>>>>> correctness testing, I'm going to be starting work on performance
>>>>>>
>>>>> testing
>>>>
>>>>> and benchmarking. I wanted to reach out and see what people's
>>>>>>
>>>>> experiences
>>>>
>>>>> have been with performance testing and benchmarking
>>>>>> frameworks, particularly in other Apache projects. Anyone have any
>>>>>> experience or thoughts?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Jason
>>>>>>
>>>>>> --
>>>>>> -------
>>>>>> Jason Kuster
>>>>>> Apache Beam (Incubating) / Google Cloud Dataflow
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reply via email to