Thanks for the comments Zhu Zhu! > 1. How do we measure the job throughput? By measuring the job execution > time on a finite input data set, or measuring the QPS when the job has > reached a stable state? > I ask this because that, with LazyFromSource schedule mode, tasks are > launched gradually on processing progress. > So if we are measuring the throughput in the latter way, > the LazyFromSource scheduling would make no difference with Eager > scheduling. So we can drop this dimension if taking this way. > By measuring the total execution time, however, it can be kept since the > scheduling effectiveness can make differences, especially in small input > data set cases.
we plan to meaure the job throughout by meauring the qps when the job has reached a stable state. If as you said, there is no difference between lazyfromsource and eager in this measuring way, we can adjust the test scenario after running for a while, and remove the duplicate part. > 2. In our prior experiences, the performance result is usually not that > stable, which may make the perf degradation harder to detect. > Shall we define the rounds to run a job and how to aggregate the > result, so that we can get a more reliable final performance result? Good advice, we plan to run multi rounds(5 is the default value ) per scene ,then calculate the average value as the result. > 在 2019年11月21日,下午3:01,Zhu Zhu <reed...@gmail.com> 写道: > > Thanks Yu for bringing up this discussion. > The e2e perf tests can be really helpful and the overall design looks good > to me. > > Sorry it's late but I have 2 questions about the result check. > 1. How do we measure the job throughput? By measuring the job execution > time on a finite input data set, or measuring the QPS when the job has > reached a stable state? > I ask this because that, with LazyFromSource schedule mode, tasks are > launched gradually on processing progress. > So if we are measuring the throughput in the latter way, > the LazyFromSource scheduling would make no difference with Eager > scheduling. So we can drop this dimension if taking this way. > By measuring the total execution time, however, it can be kept since the > scheduling effectiveness can make differences, especially in small input > data set cases. > 2. In our prior experiences, the performance result is usually not that > stable, which may make the perf degradation harder to detect. > Shall we define the rounds to run a job and how to aggregate the > result, so that we can get a more reliable final performance result? > > Thanks, > Zhu Zhu > > Yu Li <car...@gmail.com> 于2019年11月14日周四 上午10:52写道: > >> Since one week passed and no more comments, I assume the latest FLIP doc >> looks good to all and will open a VOTE thread soon for the FLIP. Thanks for >> all the comments and discussion! >> >> Best Regards, >> Yu >> >> >> On Thu, 7 Nov 2019 at 18:35, Yu Li <car...@gmail.com> wrote: >> >>> Thanks for the comments Biao! >>> >>> bq. It seems this proposal is separated into several stages. Is there a >>> more detailed plan? >>> Good point! For stage one we'd like to try introducing the benchmark >>> first, so we could guard the release (hopefully starting from 1.10). For >>> other stages, we don't have detailed plan yet, but will add child FLIPs >>> when moving on and open new discussion/voting separately. I have updated >>> the FLIP document to better reflect this, please check it and let me know >>> what you think. Thanks. >>> >>> Best Regards, >>> Yu >>> >>> >>> On Tue, 5 Nov 2019 at 10:16, Biao Liu <mmyy1...@gmail.com> wrote: >>> >>>> Thanks Yu for bringing this topic. >>>> >>>> +1 for this proposal. Glad to have an e2e performance testing. >>>> >>>> It seems this proposal is separated into several stages. Is there a more >>>> detailed plan? >>>> >>>> Thanks, >>>> Biao /'bɪ.aʊ/ >>>> >>>> >>>> >>>> On Mon, 4 Nov 2019 at 19:54, Congxian Qiu <qcx978132...@gmail.com> >> wrote: >>>> >>>>> +1 for this idea. >>>>> >>>>> Currently, we have the micro benchmark for flink, which can help us >> find >>>>> the regressions. And I think the e2e jobs performance testing can also >>>> help >>>>> us to cover more scenarios. >>>>> >>>>> Best, >>>>> Congxian >>>>> >>>>> >>>>> Jingsong Li <jingsongl...@gmail.com> 于2019年11月4日周一 下午5:37写道: >>>>> >>>>>> +1 for the idea. Thanks Yu for driving this. >>>>>> Just curious about that can we collect the metrics about Job >>>> scheduling >>>>> and >>>>>> task launch. the speed of this part is also important. >>>>>> We can add tests for watch it too. >>>>>> >>>>>> Look forward to more batch test support. >>>>>> >>>>>> Best, >>>>>> Jingsong Lee >>>>>> >>>>>> On Mon, Nov 4, 2019 at 10:00 AM OpenInx <open...@gmail.com> wrote: >>>>>> >>>>>>>> The test cases are written in java and scripts in python. We >>>> propose >>>>> a >>>>>>> separate directory/module in parallel with flink-end-to-end-tests, >>>> with >>>>>> the >>>>>>>> name of flink-end-to-end-perf-tests. >>>>>>> >>>>>>> Glad to see that the newly introduced e2e test will be written in >>>> Java. >>>>>>> because I'm re-working on the existed e2e tests suites from BASH >>>>> scripts >>>>>>> to Java test cases so that we can support more external system , >>>> such >>>>> as >>>>>>> running the testing job on yarn+flink, docker+flink, >>>> standalone+flink, >>>>>>> distributed kafka cluster etc. >>>>>>> BTW, I think the perf e2e test suites will also need to be >> designed >>>> as >>>>>>> supporting running on both standalone env and distributed env. >> will >>>> be >>>>>>> helpful >>>>>>> for developing & evaluating the perf. >>>>>>> Thanks. >>>>>>> >>>>>>> On Mon, Nov 4, 2019 at 9:31 AM aihua li <liaihua1...@gmail.com> >>>> wrote: >>>>>>> >>>>>>>> In stage1, the checkpoint mode isn't disabled,and uses heap as >> the >>>>>>>> statebackend. >>>>>>>> I think there should be some special scenarios to test >> checkpoint >>>> and >>>>>>>> statebackend, which will be discussed and added in the >>>> release-1.11 >>>>>>>> >>>>>>>>> 在 2019年11月2日,上午12:13,Yun Tang <myas...@live.com> 写道: >>>>>>>>> >>>>>>>>> By the way, do you think it's worthy to add a checkpoint mode >>>> which >>>>>>> just >>>>>>>> disable checkpoint to run end-to-end jobs? And when will stage2 >>>> and >>>>>>> stage3 >>>>>>>> be discussed in more details? >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best, Jingsong Lee >>>>>> >>>>> >>>> >>> >>