[ 
https://issues.apache.org/jira/browse/BEAM-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496870#comment-16496870
 ] 

Pablo Estrada edited comment on BEAM-4432 at 5/31/18 5:16 PM:
--------------------------------------------------------------

Hi Ismael!
I understand your concerns. The code for these sources has specific knobs that 
will allow us to do specific testing of different scenarios. As an example, 
we're looking to study GroupByKey performance in different runners (specially 
for batch), and this requires testing a uniform distribution, but also testing 
a distribution that is heavily weighted towards a few keys (Zipf, one-key). 
We'll also work on tests for Combine, CoGroupByKey, ParDo (to understand 
per-fused-step overhead), etc.

Later on, as we decide to test dynamic rebalancing, we may want to have steps 
that produce artifical cpu-based or sleep-based delays (`SyntheticStep`), to 
see how runners react to different unexpected per-element processing delays.

I'd like to go forward with this, and focus next on adding a robust suite of 
perf tests. WDYT?


was (Author: pabloem):
Hi Ismael!
I understand your concerns. The code for these sources has specific knobs that 
will allow us to do specific testing of different scenarios. As an example, 
we're looking to study GroupByKey performance in different runners (specially 
for batch), and this requires testing a uniform distribution, but also testing 
a distribution that is heavily weighted towards a few keys (Zipf, one-key). 
We'll also work on tests for Combine, CoGroupByKey, ParDo (to understand 
per-fused-step overhead), etc.

Later on, as we decide to test dynamic rebalancing, we may want to have steps 
that produce artifical cpu-based or sleep-based delays (`SyntheticStep`), to 
see how runners react to different unexpected per-element processing delays.

I'd like to go forward with this, and focus next on adding a robust suite of 
perf tests

> Performance tests need a way to generate Synthetic data
> -------------------------------------------------------
>
>                 Key: BEAM-4432
>                 URL: https://issues.apache.org/jira/browse/BEAM-4432
>             Project: Beam
>          Issue Type: Improvement
>          Components: testing
>            Reporter: Pablo Estrada
>            Assignee: Pablo Estrada
>            Priority: Minor
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> GenerateSequence fal.lls short in this regard, as we may want to generate 
> data in custom distributions, or with specific repeatability requirements / 
> and hardcoded delays for autoscaling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to