Thanks for the proposal, Yuepeng.

I think this FLIP is mostly orthogonal to FLIP-505. This FLIP essentially
tries to improve the retention policy of the actual archives, while
FLIP-505 mainly focuses on caching. One connection between the two FLIPs
might be when the actual archive expires and gets removed, it might make
sense to also remove the local cache.

A few question about this FlIP:

1. What is the use case for ttlAndQuantity mode? It seems usually the
desired behavior is ttlOrQuantity. If so, we can just add a ttl retention
config.
2. When there are multiple history server instances with different
configurations, they are working independently today and may have conflict
configs. This is an existing problem, but since we are adding more configs
to the retention policy, it increases the chance of config conflicts. It
would be good to have a clear user story for when there are multiple
history server instances.

Thanks,

Jiangjie (Becket) Qin

On Thu, Aug 14, 2025 at 1:56 PM Allison <achang5...@gmail.com> wrote:

> Hi Yuepeng,
>
> Looks like this work can have some symbiosis with the change that I've
> proposed here in FLIP-505. This addresses the question that Ryan asked
> about whether or not remotely stored job archives will be impacted if the
> retention is changed. Feel free to take a look at the FLIP as well as the
> PR for FLIP-505. Looks like we have the opportunity to significantly
> improve the History server with these two changes.
>
> FLIP-505:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+505%3A+Flink+History+Server+Scability+Improvements%2C+Remote+Data+Store+Fetch+and+Per+Job+Fetch
> PR: https://github.com/apache/flink/pull/26878
>
> Best,
> Allison
>
>
> On Thu, Aug 14, 2025 at 9:51 AM Yuepeng Pan <panyuep...@apache.org> wrote:
>
> > Hi, Ryan van Huuksloot.
> >
> > > Might be worth stating that explicitly in the FLIP.
> > Nice idea~ The sub-section added here[1] to clarify the item.
> >
> > Thanks a lot !
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857#FLIP490:EnhancedJobHistoryRetentionPoliciesforHistoryServer
> > ​-Thetimingtocheckwhethertargetfileshaveexceededtheretentionthresholds
> >
> > Best,
> > Yuepeng Pan
> >
> > On 2025/08/14 16:27:39 Ryan van Huuksloot wrote:
> > > That sounds like a good option.
> > >
> > > Might be worth stating that explicitly in the FLIP.
> > >
> > > No other questions from me - will be a nice extension!
> > >
> > > Ryan van Huuksloot
> > > Staff Engineer, Infrastructure | Streaming Platform
> > > [image: Shopify]
> > > <
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > >
> > >
> > >
> > > On Thu, Aug 14, 2025 at 12:22 PM Yuepeng Pan <panyuep...@apache.org>
> > wrote:
> > >
> > > > Hi, Hi, Ryan van Huuksloot.
> > > >
> > > > >Are you planning on having a thread to check for TTL? Or what is the
> > plan
> > > > >for TTL?
> > > > >The quantity based would have a check when a new job is archived?
> > > >
> > > > Just like the implementation in the POC[1], if we continue following
> > the
> > > > process where
> > > > HistoryServer#start method periodically invokes
> > > > HistoryServerArchiveFetcher#fetchArchives
> > > > based on 'historyserver.archive.fs.refresh-interval' to check
> > > > whether target files should be retained, what do you think about it ?
> > > > Of course, I'm very open to hearing about other potentially better
> > > > implementation approaches.
> > > > Please let me know what's your opinion.
> > > > Thank you.
> > > >
> > > > [1] https://github.com/apache/flink/pull/26902
> > > >
> > > > Best,
> > > > Yuepeng Pan
> > > >
> > > >
> > > > On 2025/08/14 16:07:10 Ryan van Huuksloot wrote:
> > > > > Thanks, sounds good.
> > > > >
> > > > > Are you planning on having a thread to check for TTL? Or what is
> the
> > plan
> > > > > for TTL?
> > > > > The quantity based would have a check when a new job is archived?
> > > > >
> > > > > Ryan van Huuksloot
> > > > > Staff Engineer, Infrastructure | Streaming Platform
> > > > > [image: Shopify]
> > > > > <
> > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Aug 14, 2025 at 12:04 PM Yuepeng Pan <
> panyuep...@apache.org>
> > > > wrote:
> > > > >
> > > > > > Hi, Ryan van Huuksloot.
> > > > > >
> > > > > > Thank you very much for your reply. > Question: Is the History
> > Server
> > > > then
> > > > > > going to delete the files stored? > (i.e. we use GCS, would it
> > delete
> > > > the
> > > > > > files there as well?) > Or is this strictly what is shown in the
> > UI?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Yes, this feature introduced in the FLIP is a super-set of the
> > original
> > > > > > feature that is controlled by
> > 'historyserver.archive.retained-jobs'.
> > > > > >
> > > > > > So if I understand correctly, after the new feature is
> introduced,
> > it
> > > > > > would affect the retention period of remote distributed storage
> > jobs
> > > > > > history files as well, not only for what is shown in the UI.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Yuepeng Pan
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > At 2025-08-14 23:34:54, "Ryan van Huuksloot"
> > > > > > <ryan.vanhuuksl...@shopify.com.INVALID> wrote:
> > > > > > >I took a look. Overall it would be nice to have more ways to
> > > > configure the
> > > > > > >History Server.
> > > > > > >
> > > > > > >Question: Is the History Server then going to delete the files
> > stored?
> > > > > > >(i.e. we use GCS, would it delete the files there as well?)
> > > > > > >Or is this strictly what is shown in the UI?
> > > > > > >
> > > > > > >Ryan van Huuksloot
> > > > > > >Staff Engineer, Infrastructure | Streaming Platform
> > > > > > >[image: Shopify]
> > > > > > ><
> > > >
> > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
> > > > > > >
> > > > > > >
> > > > > > >On Thu, Aug 14, 2025 at 11:17 AM Yuepeng Pan <
> > panyuep...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > >> Bumping this thread. Thanks!
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Yuepeng Pan
> > > > > > >>
> > > > > > >> On 2025/08/11 03:49:27 Yuepeng Pan wrote:
> > > > > > >> > Hi community,
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > Currently, HistoryServer supports only a quantity-based job
> > > > archive
> > > > > > >> retention policy [1].
> > > > > > >> > This is insufficient for scenarios such as:
> > > > > > >> > - Time-based retention (e.g., last X days).
> > > > > > >> > - Combined rules (e.g., within 7 days AND ≤100 jobs).
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > To address these limitations, I’d like to start a discussion
> > on
> > > > > > FLIP-490
> > > > > > >> [2],
> > > > > > >> > which proposes a more flexible job archive retention
> mechanism
> > > > that
> > > > > > >> supports time-based, quantity-based, and composite strategies
> > (with
> > > > > > AND/OR
> > > > > > >> logic).
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > Looking forward to your feedback.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > Best,
> > > > > > >> > Yuepeng Pan
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > [1]
> > > > > > >>
> > > > > >
> > > >
> >
> https://github.com/apache/flink/blob/cae5fb4d3b6d9e0c10c3539ea4994fc1ad463b70/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServer.java#L241
> > > > > > >> > [2]
> > > > > > >>
> > > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to