Re: [DISCUSS] FLIP-83: Flink End-to-end Performance Testing Framework

aihua li Thu, 21 Nov 2019 00:00:16 -0800

Thanks for the comments Zhu Zhu!

> 1. How do we measure the job throughput? By measuring the job execution
> time on a finite input data set, or measuring the QPS when the job has
> reached a stable state?
>    I ask this because that, with LazyFromSource schedule mode, tasks are
> launched gradually on processing progress.
>    So if we are measuring the throughput in the latter way,
> the LazyFromSource scheduling would make no difference with Eager
> scheduling. So we can drop this dimension if taking this way.
>   By measuring the total execution time, however, it can be kept since the
> scheduling effectiveness can make differences, especially in small input
> data set cases.


we plan to meaure the job throughout by meauring the qps when the job has 
reached a stable state.
If as you said, there is no difference between lazyfromsource and eager in this 
measuring way, we can adjust the test scenario after running for a while, and 
remove the duplicate part.

> 2. In our prior experiences, the performance result is usually not that
> stable, which may make the perf degradation harder to detect.
>   Shall we define the rounds to run a job and how to aggregate the
> result,  so that we can get a more reliable final performance result?

Good advice, we plan to run multi rounds(5 is the default value ) per scene 
,then calculate the average value as the result.



 

> 在 2019年11月21日，下午3:01，Zhu Zhu <reed...@gmail.com> 写道：
> 
> Thanks Yu for bringing up this discussion.
> The e2e perf tests can be really helpful and the overall design looks good
> to me.
> 
> Sorry it's late but I have 2 questions about the result check.
> 1. How do we measure the job throughput? By measuring the job execution
> time on a finite input data set, or measuring the QPS when the job has
> reached a stable state?
>    I ask this because that, with LazyFromSource schedule mode, tasks are
> launched gradually on processing progress.
>    So if we are measuring the throughput in the latter way,
> the LazyFromSource scheduling would make no difference with Eager
> scheduling. So we can drop this dimension if taking this way.
>   By measuring the total execution time, however, it can be kept since the
> scheduling effectiveness can make differences, especially in small input
> data set cases.
> 2. In our prior experiences, the performance result is usually not that
> stable, which may make the perf degradation harder to detect.
>   Shall we define the rounds to run a job and how to aggregate the
> result,  so that we can get a more reliable final performance result?
> 
> Thanks,
> Zhu Zhu
> 
> Yu Li <car...@gmail.com> 于2019年11月14日周四 上午10:52写道：
> 
>> Since one week passed and no more comments, I assume the latest FLIP doc
>> looks good to all and will open a VOTE thread soon for the FLIP. Thanks for
>> all the comments and discussion!
>> 
>> Best Regards,
>> Yu
>> 
>> 
>> On Thu, 7 Nov 2019 at 18:35, Yu Li <car...@gmail.com> wrote:
>> 
>>> Thanks for the comments Biao!
>>> 
>>> bq. It seems this proposal is separated into several stages. Is there a
>>> more detailed plan?
>>> Good point! For stage one we'd like to try introducing the benchmark
>>> first, so we could guard the release (hopefully starting from 1.10). For
>>> other stages, we don't have detailed plan yet, but will add child FLIPs
>>> when moving on and open new discussion/voting separately. I have updated
>>> the FLIP document to better reflect this, please check it and let me know
>>> what you think. Thanks.
>>> 
>>> Best Regards,
>>> Yu
>>> 
>>> 
>>> On Tue, 5 Nov 2019 at 10:16, Biao Liu <mmyy1...@gmail.com> wrote:
>>> 
>>>> Thanks Yu for bringing this topic.
>>>> 
>>>> +1 for this proposal. Glad to have an e2e performance testing.
>>>> 
>>>> It seems this proposal is separated into several stages. Is there a more
>>>> detailed plan?
>>>> 
>>>> Thanks,
>>>> Biao /'bɪ.aʊ/
>>>> 
>>>> 
>>>> 
>>>> On Mon, 4 Nov 2019 at 19:54, Congxian Qiu <qcx978132...@gmail.com>
>> wrote:
>>>> 
>>>>> +1 for this idea.
>>>>> 
>>>>> Currently, we have the micro benchmark for flink, which can help us
>> find
>>>>> the regressions. And I think the e2e jobs performance testing can also
>>>> help
>>>>> us to cover more scenarios.
>>>>> 
>>>>> Best,
>>>>> Congxian
>>>>> 
>>>>> 
>>>>> Jingsong Li <jingsongl...@gmail.com> 于2019年11月4日周一 下午5:37写道：
>>>>> 
>>>>>> +1 for the idea. Thanks Yu for driving this.
>>>>>> Just curious about that can we collect the metrics about Job
>>>> scheduling
>>>>> and
>>>>>> task launch. the speed of this part is also important.
>>>>>> We can add tests for watch it too.
>>>>>> 
>>>>>> Look forward to more batch test support.
>>>>>> 
>>>>>> Best,
>>>>>> Jingsong Lee
>>>>>> 
>>>>>> On Mon, Nov 4, 2019 at 10:00 AM OpenInx <open...@gmail.com> wrote:
>>>>>> 
>>>>>>>> The test cases are written in java and scripts in python. We
>>>> propose
>>>>> a
>>>>>>> separate directory/module in parallel with flink-end-to-end-tests,
>>>> with
>>>>>> the
>>>>>>>> name of flink-end-to-end-perf-tests.
>>>>>>> 
>>>>>>> Glad to see that the newly introduced e2e test will be written in
>>>> Java.
>>>>>>> because  I'm re-working on the existed e2e tests suites from BASH
>>>>> scripts
>>>>>>> to Java test cases so that we can support more external system ,
>>>> such
>>>>> as
>>>>>>> running the testing job on yarn+flink, docker+flink,
>>>> standalone+flink,
>>>>>>> distributed kafka cluster etc.
>>>>>>> BTW, I think the perf e2e test suites will also need to be
>> designed
>>>> as
>>>>>>> supporting running on both standalone env and distributed env.
>> will
>>>> be
>>>>>>> helpful
>>>>>>> for developing & evaluating the perf.
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> On Mon, Nov 4, 2019 at 9:31 AM aihua li <liaihua1...@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> In stage1, the checkpoint mode isn't disabled,and uses heap as
>> the
>>>>>>>> statebackend.
>>>>>>>> I think there should be some special scenarios to test
>> checkpoint
>>>> and
>>>>>>>> statebackend, which will be discussed and added in the
>>>> release-1.11
>>>>>>>> 
>>>>>>>>> 在 2019年11月2日，上午12:13，Yun Tang <myas...@live.com> 写道：
>>>>>>>>> 
>>>>>>>>> By the way, do you think it's worthy to add a checkpoint mode
>>>> which
>>>>>>> just
>>>>>>>> disable checkpoint to run end-to-end jobs? And when will stage2
>>>> and
>>>>>>> stage3
>>>>>>>> be discussed in more details?
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best, Jingsong Lee
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: [DISCUSS] FLIP-83: Flink End-to-end Performance Testing Framework

Reply via email to