+1 for the feature. Mang Zhang <zhangma...@163.com> 于2022年4月28日周四 11:36写道:
> Hi zhu: > > > This sounds like a great job! Thanks for your great job. > In our company, there are already some jobs using Flink Batch, > but everyone knows that the offline cluster has a lot more load than > the online cluster, and the failure rate of the machine is also much higher. > If this work is done, we'd love to use it, it's simply awesome for our > flink users. > thanks again! > > > > > > > > -- > > Best regards, > Mang Zhang > > > > > > At 2022-04-27 10:46:06, "Zhu Zhu" <zh...@apache.org> wrote: > >Hi everyone, > > > >More and more users are running their batch jobs on Flink nowadays. > >One major problem they encounter is slow tasks running on hot/bad > >nodes, resulting in very long and uncontrollable execution time of > >batch jobs. This problem is a pain or even unacceptable in > >production. Many users have been asking for a solution for it. > > > >Therefore, I'd like to revive the discussion of speculative > >execution to solve this problem. > > > >Weijun Wang, Jing Zhang, Lijie Wang and I had some offline > >discussions to refine the design[1]. We also implemented a PoC[2] > >and verified it using TPC-DS benchmarks and production jobs. > > > >Looking forward to your feedback! > > > >Thanks, > >Zhu > > > >[1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+execution+for+Batch+Job > >[2] > https://github.com/zhuzhurk/flink/commits/1.14-speculative-execution-poc > > > > > >刘建刚 <liujiangangp...@gmail.com> 于2021年12月13日周一 11:38写道: > > > >> Any progress on the feature? We have the same requirement in our > company. > >> Since the soft and hard environment can be complex, it is normal to see > a > >> slow task which determines the execution time of the flink job. > >> > >> <wangw...@sina.cn> 于2021年6月20日周日 22:35写道: > >> > >> > Hi everyone, > >> > > >> > I would like to kick off a discussion on speculative execution for > batch > >> > job. > >> > I have created FLIP-168 [1] that clarifies our motivation to do this > and > >> > some improvement proposals for the new design. > >> > It would be great to resolve the problem of long tail task in batch > job. > >> > Please let me know your thoughts. Thanks. > >> > Regards, > >> > wangwj > >> > [1] > >> > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+execution+for+Batch+Job > >> > > >> >