Hi Yuepeng,

Sorry for the confusion. I meant to ask what is the use case for
ttlOrQuantity mode? Is it sufficient to delete the job archive when either
TTL or quantity is reached if both are set?

Regarding the case when there are multiple history server instances, if we
don't enforce a behavior, users can go with either a) and b), and it would
just be up to the user to choose. We need to document the behavior properly.

Thanks,

Jiangjie (Becket) Qin


On Mon, Aug 18, 2025 at 10:28 PM Yuepeng Pan <panyuep...@apache.org> wrote:

> Hi, Becket.
>
> Thank you for your comments.
>
> > 1. What is the use case for ttlAndQuantity mode? It seems usually the
>
> > desired behavior is ttlOrQuantity. If so, we can just add a ttl
> retention config.
>
>
>
>
> The ttlAndQuantity mode means that files in the remote directory can only
> be retained if their modification time is within the valid TTL
>
> and the total number of files does not exceed the maximum limit.
>
> One of the main purposes of this configuration item is to impose
> restrictions on the following situations:
>
> - Within the TTL, the number of files grows too large, leading to
> excessive storage usage or too many files.
>
> - Files remain within the file quantity threshold, but their modification
> times far exceed the TTL.
>
>
>
>
> > 2. When there are multiple history server instances with different
>
> > configurations, they are working independently today and may have
> conflict
>
> > configs. This is an existing problem, but since we are adding more
> configs
>
> > to the retention policy, it increases the chance of config conflicts. It
>
> > would be good to have a clear user story for when there are multiple
> history server instances.
>
>
>
>
> This is indeed a good question.
>
> What do you think if we add a description like the following to the newly
> introduced configuration item section in the FLIP?
>
> a. If there are multiple HistoryServer instances using the same
> historyserver.archive.fs.dir directory as the refresh directory,
>
>  you should enable and configure this feature in only one HistoryServer
> instance to avoid errors caused by multiple instances simultaneously
> cleaning up remote files.
>
> -OR-
>
> b. If there are multiple HistoryServer instances using the same
> historyserver.archive.fs.dir directory as the refresh directory,
>
> you need to keep the value of this configuration consistent across them.
>
>
>
>
> Regardless of whether option a or option b is chosen, it is necessary to
> enhance the corresponding exception handling when reading from and deleting
> remote files.
>
>
>
>
> I’m really looking forward to hearing other suitable resolution candidates
> about the above items.
>
> Please let me know your opinion.
>
> Best,
> Yuepeng Pan
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> At 2025-08-19 00:37:26, "Becket Qin" <becket....@gmail.com> wrote:
> >Thanks for the proposal, Yuepeng.
> >
> >I think this FLIP is mostly orthogonal to FLIP-505. This FLIP essentially
> >tries to improve the retention policy of the actual archives, while
> >FLIP-505 mainly focuses on caching. One connection between the two FLIPs
> >might be when the actual archive expires and gets removed, it might make
> >sense to also remove the local cache.
> >
> >A few question about this FlIP:
> >
> >1. What is the use case for ttlAndQuantity mode? It seems usually the
> >desired behavior is ttlOrQuantity. If so, we can just add a ttl retention
> >config.
> >2. When there are multiple history server instances with different
> >configurations, they are working independently today and may have conflict
> >configs. This is an existing problem, but since we are adding more configs
> >to the retention policy, it increases the chance of config conflicts. It
> >would be good to have a clear user story for when there are multiple
> >history server instances.
> >
> >Thanks,
> >
> >Jiangjie (Becket) Qin
> >
> >On Thu, Aug 14, 2025 at 1:56 PM Allison <achang5...@gmail.com> wrote:
> >
> >> Hi Yuepeng,
> >>
> >> Looks like this work can have some symbiosis with the change that I've
> >> proposed here in FLIP-505. This addresses the question that Ryan asked
> >> about whether or not remotely stored job archives will be impacted if
> the
> >> retention is changed. Feel free to take a look at the FLIP as well as
> the
> >> PR for FLIP-505. Looks like we have the opportunity to significantly
> >> improve the History server with these two changes.
> >>
> >> FLIP-505:
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+505%3A+Flink+History+Server+Scability+Improvements%2C+Remote+Data+Store+Fetch+and+Per+Job+Fetch
> >> PR: https://github.com/apache/flink/pull/26878
> >>
> >> Best,
> >> Allison
> >>
> >>
> >> On Thu, Aug 14, 2025 at 9:51 AM Yuepeng Pan <panyuep...@apache.org>
> wrote:
> >>
> >> > Hi, Ryan van Huuksloot.
> >> >
> >> > > Might be worth stating that explicitly in the FLIP.
> >> > Nice idea~ The sub-section added here[1] to clarify the item.
> >> >
> >> > Thanks a lot !
> >> >
> >> > [1]
> >> >
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857#FLIP490:EnhancedJobHistoryRetentionPoliciesforHistoryServer
> >> > -Thetimingtocheckwhethertargetfileshaveexceededtheretentionthresholds
> >> >
> >> > Best,
> >> > Yuepeng Pan
> >> >
> >> > On 2025/08/14 16:27:39 Ryan van Huuksloot wrote:
> >> > > That sounds like a good option.
> >> > >
> >> > > Might be worth stating that explicitly in the FLIP.
> >> > >
> >> > > No other questions from me - will be a nice extension!
> >> > >
> >> > > Ryan van Huuksloot
> >> > > Staff Engineer, Infrastructure | Streaming Platform
> >> > > [image: Shopify]
> >> > > <
> >> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Aug 14, 2025 at 12:22 PM Yuepeng Pan <panyuep...@apache.org
> >
> >> > wrote:
> >> > >
> >> > > > Hi, Hi, Ryan van Huuksloot.
> >> > > >
> >> > > > >Are you planning on having a thread to check for TTL? Or what is
> the
> >> > plan
> >> > > > >for TTL?
> >> > > > >The quantity based would have a check when a new job is archived?
> >> > > >
> >> > > > Just like the implementation in the POC[1], if we continue
> following
> >> > the
> >> > > > process where
> >> > > > HistoryServer#start method periodically invokes
> >> > > > HistoryServerArchiveFetcher#fetchArchives
> >> > > > based on 'historyserver.archive.fs.refresh-interval' to check
> >> > > > whether target files should be retained, what do you think about
> it ?
> >> > > > Of course, I'm very open to hearing about other potentially better
> >> > > > implementation approaches.
> >> > > > Please let me know what's your opinion.
> >> > > > Thank you.
> >> > > >
> >> > > > [1] https://github.com/apache/flink/pull/26902
> >> > > >
> >> > > > Best,
> >> > > > Yuepeng Pan
> >> > > >
> >> > > >
> >> > > > On 2025/08/14 16:07:10 Ryan van Huuksloot wrote:
> >> > > > > Thanks, sounds good.
> >> > > > >
> >> > > > > Are you planning on having a thread to check for TTL? Or what is
> >> the
> >> > plan
> >> > > > > for TTL?
> >> > > > > The quantity based would have a check when a new job is
> archived?
> >> > > > >
> >> > > > > Ryan van Huuksloot
> >> > > > > Staff Engineer, Infrastructure | Streaming Platform
> >> > > > > [image: Shopify]
> >> > > > > <
> >> >
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Thu, Aug 14, 2025 at 12:04 PM Yuepeng Pan <
> >> panyuep...@apache.org>
> >> > > > wrote:
> >> > > > >
> >> > > > > > Hi, Ryan van Huuksloot.
> >> > > > > >
> >> > > > > > Thank you very much for your reply. > Question: Is the History
> >> > Server
> >> > > > then
> >> > > > > > going to delete the files stored? > (i.e. we use GCS, would it
> >> > delete
> >> > > > the
> >> > > > > > files there as well?) > Or is this strictly what is shown in
> the
> >> > UI?
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > Yes, this feature introduced in the FLIP is a super-set of the
> >> > original
> >> > > > > > feature that is controlled by
> >> > 'historyserver.archive.retained-jobs'.
> >> > > > > >
> >> > > > > > So if I understand correctly, after the new feature is
> >> introduced,
> >> > it
> >> > > > > > would affect the retention period of remote distributed
> storage
> >> > jobs
> >> > > > > > history files as well, not only for what is shown in the UI.
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > Best,
> >> > > > > > Yuepeng Pan
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > At 2025-08-14 23:34:54, "Ryan van Huuksloot"
> >> > > > > > <ryan.vanhuuksl...@shopify.com.INVALID> wrote:
> >> > > > > > >I took a look. Overall it would be nice to have more ways to
> >> > > > configure the
> >> > > > > > >History Server.
> >> > > > > > >
> >> > > > > > >Question: Is the History Server then going to delete the
> files
> >> > stored?
> >> > > > > > >(i.e. we use GCS, would it delete the files there as well?)
> >> > > > > > >Or is this strictly what is shown in the UI?
> >> > > > > > >
> >> > > > > > >Ryan van Huuksloot
> >> > > > > > >Staff Engineer, Infrastructure | Streaming Platform
> >> > > > > > >[image: Shopify]
> >> > > > > > ><
> >> > > >
> >> >
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >On Thu, Aug 14, 2025 at 11:17 AM Yuepeng Pan <
> >> > panyuep...@apache.org>
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > >> Bumping this thread. Thanks!
> >> > > > > > >>
> >> > > > > > >> Best,
> >> > > > > > >> Yuepeng Pan
> >> > > > > > >>
> >> > > > > > >> On 2025/08/11 03:49:27 Yuepeng Pan wrote:
> >> > > > > > >> > Hi community,
> >> > > > > > >> >
> >> > > > > > >> >
> >> > > > > > >> > Currently, HistoryServer supports only a quantity-based
> job
> >> > > > archive
> >> > > > > > >> retention policy [1].
> >> > > > > > >> > This is insufficient for scenarios such as:
> >> > > > > > >> > - Time-based retention (e.g., last X days).
> >> > > > > > >> > - Combined rules (e.g., within 7 days AND ≤100 jobs).
> >> > > > > > >> >
> >> > > > > > >> >
> >> > > > > > >> > To address these limitations, I’d like to start a
> discussion
> >> > on
> >> > > > > > FLIP-490
> >> > > > > > >> [2],
> >> > > > > > >> > which proposes a more flexible job archive retention
> >> mechanism
> >> > > > that
> >> > > > > > >> supports time-based, quantity-based, and composite
> strategies
> >> > (with
> >> > > > > > AND/OR
> >> > > > > > >> logic).
> >> > > > > > >> >
> >> > > > > > >> >
> >> > > > > > >> > Looking forward to your feedback.
> >> > > > > > >> >
> >> > > > > > >> >
> >> > > > > > >> > Best,
> >> > > > > > >> > Yuepeng Pan
> >> > > > > > >> >
> >> > > > > > >> >
> >> > > > > > >> > [1]
> >> > > > > > >>
> >> > > > > >
> >> > > >
> >> >
> >>
> https://github.com/apache/flink/blob/cae5fb4d3b6d9e0c10c3539ea4994fc1ad463b70/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServer.java#L241
> >> > > > > > >> > [2]
> >> > > > > > >>
> >> > > > > >
> >> > > >
> >> >
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857
> >> > > > > > >>
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>

Reply via email to