Re: [Discuss] FLIP-407: Improve Flink Client performance in interactive scenarios

Yong Fang Sun, 07 Jan 2024 22:19:33 -0800

I agree with @Rui that the current configuration for Flink Client is a
little complex. Can we just provide one strategy with less configuration
items for all scenarios?


Best,
Fang Yong

On Mon, Jan 8, 2024 at 11:19 AM Rui Fan <[email protected]> wrote:

> Thanks xiangyu for driving this proposal! And sorry for the
> late reply.
>
> Overall looks good to me, I only have some minor questions:
>
> 1. Do we need to introduce 3 collect strategies in the first version?
>
> Large and comprehensive configuration items will bring
> additional learning costs and usage costs to users. I tend to
> provide users with out-of-the-box parameters and 2 collect
> strategies may be enough for users.
>
> IIUC, there is no big difference between exponential-delay and
> incremental-delay, especially the default parameters provided.
> I wonder could we provide a multiplier for exponential-delay strategy
> and removing the incremental-delay strategy?
>
> Of course, if you think multiplier option is not needed based on
> your production experience, it's totally fine for me. Simple is better.
>
> 2. Which strategy do you think is best in mass production?
>
> I'm working on FLIP-364[1], it's related to Flink failover restart
> strategy. IIUC, when one cluster only has a few flink jobs,
> fixed-delay is fine. It guarantees minimal latency without too
> much stress. But if one cluster has too many jobs, fixed-delay
> may not be stable.
>
> Do you think exponential-delay is better than fixed delay in this
> scenario? And which strategy is used in your production for now?
> Would you mind sharing it?
>
> Looking forwarding to your opinion~
>
> Best,
> Rui
>
> On Sat, Jan 6, 2024 at 5:54 PM xiangyu feng <[email protected]> wrote:
>
> > Hi all,
> >
> > Thanks for the comments.
> >
> > If there is no further comment, we will open the voting thread next week.
> >
> > Regards,
> > Xiangyu
> >
> > Zhanghao Chen <[email protected]> 于2024年1月3日周三 16:46写道：
> >
> > > Thanks for driving this effort on improving the interactive use
> > experience
> > > of Flink. The proposal overall looks good to me.
> > >
> > > Best,
> > > Zhanghao Chen
> > > ________________________________
> > > From: xiangyu feng <[email protected]>
> > > Sent: Tuesday, December 26, 2023 16:51
> > > To: [email protected] <[email protected]>
> > > Subject: [Discuss] FLIP-407: Improve Flink Client performance in
> > > interactive scenarios
> > >
> > > Hi devs,
> > >
> > > I'm opening this thread to discuss FLIP-407: Improve Flink Client
> > > performance in interactive scenarios. The POC test results and design
> doc
> > > can be found at: FLIP-407
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Client+performance+when+interacting+with+dedicated+Flink+Session+Clusters
> > > >
> > > .
> > >
> > > Currently, Flink Client is mainly designed for one time interaction
> with
> > > the Flink Cluster. All the resources(http connections, threads, ha
> > > services) and instances(ClusterDescriptor, ClusterClient, RestClient)
> are
> > > created and recycled for each interaction. This works well when users
> do
> > > not need to interact frequently with Flink Cluster and also saves
> > resource
> > > usage since resources are recycled immediately after each usage.
> > >
> > > However, in OLAP or StreamingWarehouse scenarios, users might submit
> > > interactive jobs to a dedicated Flink Session Cluster very often. In
> this
> > > case, we find that for short queries that can finish in less than 1s in
> > > Flink Cluster will still have E2E latency greater than 2s. Hence, we
> > > propose this FLIP to improve the Flink Client performance in this
> > scenario.
> > > This could also improve the user experience when using session debug
> > mode.
> > >
> > > The major change in this FLIP is that there will be a new introduced
> > option
> > > *'execution.interactive-client'*. When this option is enabled, Flink
> > > Client will reuse all the necessary resources to improve interactive
> > > performance, including: HA Services, HTTP connections, threads and all
> > > kinds of instances related to a long-running Flink Cluster. The default
> > > value of this option will be false, then Flink Client will behave as
> > > before.
> > >
> > > Also, this FLIP proposed a configurable RetryStrategy when fetching
> > results
> > > from client-side to Flink Cluster. In interactive scenarios, this can
> > save
> > > more than 15% of TM CPU usage without performance degradation.
> > >
> > > Looking forward to your feedback, thanks.
> > >
> > > Best regards,
> > > Xiangyu
> > >
> >
>

Re: [Discuss] FLIP-407: Improve Flink Client performance in interactive scenarios

Reply via email to