Re: [DISCUSS] FLIP-505: Flink History Server Scability Improvements, Remote Data Store Fetch and Per Job Fetch

Allison Mon, 03 Mar 2025 13:49:15 -0800

Hi Yanquan,

I've updated the FLIP to contain the default values, thanks for your help!


Sincerely
- Allison

On Thu, Jan 30, 2025 at 3:21 AM Yanquan Lv <[email protected]> wrote:

> Thank you for your explanation. I have basically solved the previous
> questions.
>
> Regarding the second point, I would like to suggest clarifying the default
> values for newly adding parameters in `Public Interfaces` session.
>
> ---------- Forwarded message ---------
> 发件人： Allison <[email protected]>
> Date: 2025年1月30日周四 上午3:42
> Subject: Re: [DISCUSS] FLIP-505: Flink History Server Scability
> Improvements, Remote Data Store Fetch and Per Job Fetch
> To: <[email protected]>
>
>
> Hi Yanquan,
>
> Thanks for taking a look at this. Re: your questions:
>
> 1. Yes, I've updated the FLIP to be more clear, but it involves modifying
> the existing configuration of historyserver.archive.retained-jobs to
> historyserver.archive.cached-retained-jobs. The number of remote-jobs
> stored can be infinite, the thought behind this is that the remote data
> storage can be cleaned up or limited by a separate protocol that can be
> customized to each individual use case.
> 2. Could you clarify this a bit? I'm not sure I understand this part, do
> you mean to add what the configurations would be set to in the case of them
> not being defined to the FLIP?
> 3. historyserver.archive.fs.refresh-interval is the time duration between a
> call to the remote data storage to find fresh data. What it configures is
> how often the FHS polls the remote data store for new files. The remote
> data store is written to whenever a job is finished.
>
> Hope this clarifies some things.
>
> Best,
> - Allison
>
>
> On Mon, Jan 27, 2025 at 7:10 PM Yanquan Lv <[email protected]> wrote:
>
> > Hi, Allison. Thanks for driving this FLIP.
> > I have some questions to confirm:
> >
> > 1. I can’t find any existed configuration name
> > `historyserver.archive.cached-retained-jobs`, I guess that what you mean
> is
> > modifing existing configuration from
> `historyserver.archive.retained-jobs`
> > to `historyserver.archive.cached-retained-jobs`. If so, If we only limit
> > the number of retained-jobs stored locally, is the number of
> retained-jobs
> > stored remotely infinite?
> > 2. I think it would be better to provide instructions for adding default
> > values to HistoryServerOptions.
> > 3. Does `historyserver.archive.fs.refresh-interval` apply to both local
> and
> > remote storage simultaneously?
> >
> > Best,
> > Yanquan
> >
> > Allison <[email protected]> 于 2025年1月17日周五 上午8:07写道：
> >
> > > Hi everyone,
> > >
> > > I would like to initiate a discussion for the FLIP below, which
> enhances
> > to
> > > the Flink History Server to allow greater scalability of the service.
> > >
> > > Motivation:
> > >
> > > Currently, the Flink History Server (FHS) is limited in the number of
> job
> > > archives it can serve based on the storage capacity of the node that
> the
> > > FHS runs in. Job archives are stored locally in a cache which creates a
> > > local directory which is expanded out based on the contents of a single
> > > json archive file. This not only uses up local memory space, but also
> > > because of how the FHS expands the job archives into a nested directory
> > > structure, for jobs with a large number of taskmanagers or subtasks,
> > inode
> > > space often runs out.  In order to make the FHS more performant, we
> would
> > > like to introduce the ability to decouple the job archive storage for
> the
> > > FHS from being limited to the local cache, to being able to store and
> > fetch
> > > jobs archives from a remote file store.
> > >
> > > FLIP proposal document:
> > >
> > >
> >
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+505%3A+Flink+History+Server+Scability+Improvements%2C+Remote+Data+Store+Fetch+and+Per+Job+Fetch
> > >
> > > Thanks!
> > >
> > > Best,
> > > - Allison Chang
> > >
> >
>

Re: [DISCUSS] FLIP-505: Flink History Server Scability Improvements, Remote Data Store Fetch and Per Job Fetch

Reply via email to