Hi Rui and Yong,

Thx for ur reply.

My initial attention here is that for short-lived jobs under high QPS: a
fixed delay retry strategy will cause extra resource waste and not flexible
enough, an exponential-backoff strategy might significantly increase the
query latency since the interval time grows too fast. An incremental-delay
strategy could be balanced between resource consumption and short-query
latency.

With a second thought,  an exponential-delay retry strategy with a
configurable multiplier option can also achieve this goal. By setting the
default value of multiplier to 1, we can be consistent with the original
behavior and reduce the configuration items at the same time.

I've updated this FLIP accordingly, look forward to your feedback.

Regards,
Xiangyu Feng


Rui Fan <1996fan...@gmail.com> 于2024年1月8日周一 15:29写道:

> Only one strategy is fine to me.
>
> When the multiplier is set to 1, the exponential-delay will become
> fixed-delay.
> So fixed-delay may not be needed.
>
> Best,
> Rui
>
> On Mon, Jan 8, 2024 at 2:17 PM Yong Fang <zjur...@gmail.com> wrote:
>
> > I agree with @Rui that the current configuration for Flink Client is a
> > little complex. Can we just provide one strategy with less configuration
> > items for all scenarios?
> >
> > Best,
> > Fang Yong
> >
> > On Mon, Jan 8, 2024 at 11:19 AM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Thanks xiangyu for driving this proposal! And sorry for the
> > > late reply.
> > >
> > > Overall looks good to me, I only have some minor questions:
> > >
> > > 1. Do we need to introduce 3 collect strategies in the first version?
> > >
> > > Large and comprehensive configuration items will bring
> > > additional learning costs and usage costs to users. I tend to
> > > provide users with out-of-the-box parameters and 2 collect
> > > strategies may be enough for users.
> > >
> > > IIUC, there is no big difference between exponential-delay and
> > > incremental-delay, especially the default parameters provided.
> > > I wonder could we provide a multiplier for exponential-delay strategy
> > > and removing the incremental-delay strategy?
> > >
> > > Of course, if you think multiplier option is not needed based on
> > > your production experience, it's totally fine for me. Simple is better.
> > >
> > > 2. Which strategy do you think is best in mass production?
> > >
> > > I'm working on FLIP-364[1], it's related to Flink failover restart
> > > strategy. IIUC, when one cluster only has a few flink jobs,
> > > fixed-delay is fine. It guarantees minimal latency without too
> > > much stress. But if one cluster has too many jobs, fixed-delay
> > > may not be stable.
> > >
> > > Do you think exponential-delay is better than fixed delay in this
> > > scenario? And which strategy is used in your production for now?
> > > Would you mind sharing it?
> > >
> > > Looking forwarding to your opinion~
> > >
> > > Best,
> > > Rui
> > >
> > > On Sat, Jan 6, 2024 at 5:54 PM xiangyu feng <xiangyu...@gmail.com>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Thanks for the comments.
> > > >
> > > > If there is no further comment, we will open the voting thread next
> > week.
> > > >
> > > > Regards,
> > > > Xiangyu
> > > >
> > > > Zhanghao Chen <zhanghao.c...@outlook.com> 于2024年1月3日周三 16:46写道:
> > > >
> > > > > Thanks for driving this effort on improving the interactive use
> > > > experience
> > > > > of Flink. The proposal overall looks good to me.
> > > > >
> > > > > Best,
> > > > > Zhanghao Chen
> > > > > ________________________________
> > > > > From: xiangyu feng <xiangyu...@gmail.com>
> > > > > Sent: Tuesday, December 26, 2023 16:51
> > > > > To: dev@flink.apache.org <dev@flink.apache.org>
> > > > > Subject: [Discuss] FLIP-407: Improve Flink Client performance in
> > > > > interactive scenarios
> > > > >
> > > > > Hi devs,
> > > > >
> > > > > I'm opening this thread to discuss FLIP-407: Improve Flink Client
> > > > > performance in interactive scenarios. The POC test results and
> design
> > > doc
> > > > > can be found at: FLIP-407
> > > > > <
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Client+performance+when+interacting+with+dedicated+Flink+Session+Clusters
> > > > > >
> > > > > .
> > > > >
> > > > > Currently, Flink Client is mainly designed for one time interaction
> > > with
> > > > > the Flink Cluster. All the resources(http connections, threads, ha
> > > > > services) and instances(ClusterDescriptor, ClusterClient,
> RestClient)
> > > are
> > > > > created and recycled for each interaction. This works well when
> users
> > > do
> > > > > not need to interact frequently with Flink Cluster and also saves
> > > > resource
> > > > > usage since resources are recycled immediately after each usage.
> > > > >
> > > > > However, in OLAP or StreamingWarehouse scenarios, users might
> submit
> > > > > interactive jobs to a dedicated Flink Session Cluster very often.
> In
> > > this
> > > > > case, we find that for short queries that can finish in less than
> 1s
> > in
> > > > > Flink Cluster will still have E2E latency greater than 2s. Hence,
> we
> > > > > propose this FLIP to improve the Flink Client performance in this
> > > > scenario.
> > > > > This could also improve the user experience when using session
> debug
> > > > mode.
> > > > >
> > > > > The major change in this FLIP is that there will be a new
> introduced
> > > > option
> > > > > *'execution.interactive-client'*. When this option is enabled,
> Flink
> > > > > Client will reuse all the necessary resources to improve
> interactive
> > > > > performance, including: HA Services, HTTP connections, threads and
> > all
> > > > > kinds of instances related to a long-running Flink Cluster. The
> > default
> > > > > value of this option will be false, then Flink Client will behave
> as
> > > > > before.
> > > > >
> > > > > Also, this FLIP proposed a configurable RetryStrategy when fetching
> > > > results
> > > > > from client-side to Flink Cluster. In interactive scenarios, this
> can
> > > > save
> > > > > more than 15% of TM CPU usage without performance degradation.
> > > > >
> > > > > Looking forward to your feedback, thanks.
> > > > >
> > > > > Best regards,
> > > > > Xiangyu
> > > > >
> > > >
> > >
> >
>

Reply via email to