Hi, Becket.

Thank you for the clarification.
Please let me have a try on revisiting these two questions with a explanation:

> I meant to ask what is the use case for 
> ttlOrQuantity mode? Is it sufficient to delete the job archive when either 
> TTL or quantity is reached if both are set?

As the configuration key 'historyserver.archive.retained-jobs.mode' literally 
suggests, 
this policy governs the retention mode for archived historical jobs. 
When set to 'ttlOrQuantity', a target file will be retained if either of the 
following conditions is met (in other words, deletion occurs only if both 
conditions are unsatisfied):

- The file count is within the maximum retention threshold.
- The file remains within the TTL (Time to Live) period.

>Regarding the case when there are multiple history server instances, if we 
>don't enforce a behavior, users can go with either a) and b), and it would 
>just be up to the user to choose. We need to document the behavior properly.

Thanks for the comment. And I added the related content as note/comment[1] of 
the new configuration  'historyserver.archive.retained-jobs.mode' .
In the subsequent implementation phase, this part of the description will be 
refined and added to the corresponding configuration documentation.

Best,
Yuepeng Pan.

[1] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857#FLIP490:EnhancedJobHistoryRetentionPoliciesforHistoryServer-PublicInterfaces



On 2025/08/20 04:55:24 Becket Qin wrote:
> Hi Yuepeng,
> 
> Sorry for the confusion. I meant to ask what is the use case for
> ttlOrQuantity mode? Is it sufficient to delete the job archive when either
> TTL or quantity is reached if both are set?
> 
> Regarding the case when there are multiple history server instances, if we
> don't enforce a behavior, users can go with either a) and b), and it would
> just be up to the user to choose. We need to document the behavior properly.
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> 
> On Mon, Aug 18, 2025 at 10:28 PM Yuepeng Pan <panyuep...@apache.org> wrote:
> 
> > Hi, Becket.
> >
> > Thank you for your comments.
> >
> > > 1. What is the use case for ttlAndQuantity mode? It seems usually the
> >
> > > desired behavior is ttlOrQuantity. If so, we can just add a ttl
> > retention config.
> >
> >
> >
> >
> > The ttlAndQuantity mode means that files in the remote directory can only
> > be retained if their modification time is within the valid TTL
> >
> > and the total number of files does not exceed the maximum limit.
> >
> > One of the main purposes of this configuration item is to impose
> > restrictions on the following situations:
> >
> > - Within the TTL, the number of files grows too large, leading to
> > excessive storage usage or too many files.
> >
> > - Files remain within the file quantity threshold, but their modification
> > times far exceed the TTL.
> >
> >
> >
> >
> > > 2. When there are multiple history server instances with different
> >
> > > configurations, they are working independently today and may have
> > conflict
> >
> > > configs. This is an existing problem, but since we are adding more
> > configs
> >
> > > to the retention policy, it increases the chance of config conflicts. It
> >
> > > would be good to have a clear user story for when there are multiple
> > history server instances.
> >
> >
> >
> >
> > This is indeed a good question.
> >
> > What do you think if we add a description like the following to the newly
> > introduced configuration item section in the FLIP?
> >
> > a. If there are multiple HistoryServer instances using the same
> > historyserver.archive.fs.dir directory as the refresh directory,
> >
> >  you should enable and configure this feature in only one HistoryServer
> > instance to avoid errors caused by multiple instances simultaneously
> > cleaning up remote files.
> >
> > -OR-
> >
> > b. If there are multiple HistoryServer instances using the same
> > historyserver.archive.fs.dir directory as the refresh directory,
> >
> > you need to keep the value of this configuration consistent across them.
> >
> >
> >
> >
> > Regardless of whether option a or option b is chosen, it is necessary to
> > enhance the corresponding exception handling when reading from and deleting
> > remote files.
> >
> >
> >
> >
> > I’m really looking forward to hearing other suitable resolution candidates
> > about the above items.
> >
> > Please let me know your opinion.
> >
> > Best,
> > Yuepeng Pan
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > At 2025-08-19 00:37:26, "Becket Qin" <becket....@gmail.com> wrote:
> > >Thanks for the proposal, Yuepeng.
> > >
> > >I think this FLIP is mostly orthogonal to FLIP-505. This FLIP essentially
> > >tries to improve the retention policy of the actual archives, while
> > >FLIP-505 mainly focuses on caching. One connection between the two FLIPs
> > >might be when the actual archive expires and gets removed, it might make
> > >sense to also remove the local cache.
> > >
> > >A few question about this FlIP:
> > >
> > >1. What is the use case for ttlAndQuantity mode? It seems usually the
> > >desired behavior is ttlOrQuantity. If so, we can just add a ttl retention
> > >config.
> > >2. When there are multiple history server instances with different
> > >configurations, they are working independently today and may have conflict
> > >configs. This is an existing problem, but since we are adding more configs
> > >to the retention policy, it increases the chance of config conflicts. It
> > >would be good to have a clear user story for when there are multiple
> > >history server instances.
> > >
> > >Thanks,
> > >
> > >Jiangjie (Becket) Qin
> > >
> > >On Thu, Aug 14, 2025 at 1:56 PM Allison <achang5...@gmail.com> wrote:
> > >
> > >> Hi Yuepeng,
> > >>
> > >> Looks like this work can have some symbiosis with the change that I've
> > >> proposed here in FLIP-505. This addresses the question that Ryan asked
> > >> about whether or not remotely stored job archives will be impacted if
> > the
> > >> retention is changed. Feel free to take a look at the FLIP as well as
> > the
> > >> PR for FLIP-505. Looks like we have the opportunity to significantly
> > >> improve the History server with these two changes.
> > >>
> > >> FLIP-505:
> > >>
> > >>
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP+505%3A+Flink+History+Server+Scability+Improvements%2C+Remote+Data+Store+Fetch+and+Per+Job+Fetch
> > >> PR: https://github.com/apache/flink/pull/26878
> > >>
> > >> Best,
> > >> Allison
> > >>
> > >>
> > >> On Thu, Aug 14, 2025 at 9:51 AM Yuepeng Pan <panyuep...@apache.org>
> > wrote:
> > >>
> > >> > Hi, Ryan van Huuksloot.
> > >> >
> > >> > > Might be worth stating that explicitly in the FLIP.
> > >> > Nice idea~ The sub-section added here[1] to clarify the item.
> > >> >
> > >> > Thanks a lot !
> > >> >
> > >> > [1]
> > >> >
> > >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857#FLIP490:EnhancedJobHistoryRetentionPoliciesforHistoryServer
> > >> > -Thetimingtocheckwhethertargetfileshaveexceededtheretentionthresholds
> > >> >
> > >> > Best,
> > >> > Yuepeng Pan
> > >> >
> > >> > On 2025/08/14 16:27:39 Ryan van Huuksloot wrote:
> > >> > > That sounds like a good option.
> > >> > >
> > >> > > Might be worth stating that explicitly in the FLIP.
> > >> > >
> > >> > > No other questions from me - will be a nice extension!
> > >> > >
> > >> > > Ryan van Huuksloot
> > >> > > Staff Engineer, Infrastructure | Streaming Platform
> > >> > > [image: Shopify]
> > >> > > <
> > >> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Thu, Aug 14, 2025 at 12:22 PM Yuepeng Pan <panyuep...@apache.org
> > >
> > >> > wrote:
> > >> > >
> > >> > > > Hi, Hi, Ryan van Huuksloot.
> > >> > > >
> > >> > > > >Are you planning on having a thread to check for TTL? Or what is
> > the
> > >> > plan
> > >> > > > >for TTL?
> > >> > > > >The quantity based would have a check when a new job is archived?
> > >> > > >
> > >> > > > Just like the implementation in the POC[1], if we continue
> > following
> > >> > the
> > >> > > > process where
> > >> > > > HistoryServer#start method periodically invokes
> > >> > > > HistoryServerArchiveFetcher#fetchArchives
> > >> > > > based on 'historyserver.archive.fs.refresh-interval' to check
> > >> > > > whether target files should be retained, what do you think about
> > it ?
> > >> > > > Of course, I'm very open to hearing about other potentially better
> > >> > > > implementation approaches.
> > >> > > > Please let me know what's your opinion.
> > >> > > > Thank you.
> > >> > > >
> > >> > > > [1] https://github.com/apache/flink/pull/26902
> > >> > > >
> > >> > > > Best,
> > >> > > > Yuepeng Pan
> > >> > > >
> > >> > > >
> > >> > > > On 2025/08/14 16:07:10 Ryan van Huuksloot wrote:
> > >> > > > > Thanks, sounds good.
> > >> > > > >
> > >> > > > > Are you planning on having a thread to check for TTL? Or what is
> > >> the
> > >> > plan
> > >> > > > > for TTL?
> > >> > > > > The quantity based would have a check when a new job is
> > archived?
> > >> > > > >
> > >> > > > > Ryan van Huuksloot
> > >> > > > > Staff Engineer, Infrastructure | Streaming Platform
> > >> > > > > [image: Shopify]
> > >> > > > > <
> > >> >
> > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On Thu, Aug 14, 2025 at 12:04 PM Yuepeng Pan <
> > >> panyuep...@apache.org>
> > >> > > > wrote:
> > >> > > > >
> > >> > > > > > Hi, Ryan van Huuksloot.
> > >> > > > > >
> > >> > > > > > Thank you very much for your reply. > Question: Is the History
> > >> > Server
> > >> > > > then
> > >> > > > > > going to delete the files stored? > (i.e. we use GCS, would it
> > >> > delete
> > >> > > > the
> > >> > > > > > files there as well?) > Or is this strictly what is shown in
> > the
> > >> > UI?
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > Yes, this feature introduced in the FLIP is a super-set of the
> > >> > original
> > >> > > > > > feature that is controlled by
> > >> > 'historyserver.archive.retained-jobs'.
> > >> > > > > >
> > >> > > > > > So if I understand correctly, after the new feature is
> > >> introduced,
> > >> > it
> > >> > > > > > would affect the retention period of remote distributed
> > storage
> > >> > jobs
> > >> > > > > > history files as well, not only for what is shown in the UI.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > Best,
> > >> > > > > > Yuepeng Pan
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > At 2025-08-14 23:34:54, "Ryan van Huuksloot"
> > >> > > > > > <ryan.vanhuuksl...@shopify.com.INVALID> wrote:
> > >> > > > > > >I took a look. Overall it would be nice to have more ways to
> > >> > > > configure the
> > >> > > > > > >History Server.
> > >> > > > > > >
> > >> > > > > > >Question: Is the History Server then going to delete the
> > files
> > >> > stored?
> > >> > > > > > >(i.e. we use GCS, would it delete the files there as well?)
> > >> > > > > > >Or is this strictly what is shown in the UI?
> > >> > > > > > >
> > >> > > > > > >Ryan van Huuksloot
> > >> > > > > > >Staff Engineer, Infrastructure | Streaming Platform
> > >> > > > > > >[image: Shopify]
> > >> > > > > > ><
> > >> > > >
> > >> >
> > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >On Thu, Aug 14, 2025 at 11:17 AM Yuepeng Pan <
> > >> > panyuep...@apache.org>
> > >> > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > >> Bumping this thread. Thanks!
> > >> > > > > > >>
> > >> > > > > > >> Best,
> > >> > > > > > >> Yuepeng Pan
> > >> > > > > > >>
> > >> > > > > > >> On 2025/08/11 03:49:27 Yuepeng Pan wrote:
> > >> > > > > > >> > Hi community,
> > >> > > > > > >> >
> > >> > > > > > >> >
> > >> > > > > > >> > Currently, HistoryServer supports only a quantity-based
> > job
> > >> > > > archive
> > >> > > > > > >> retention policy [1].
> > >> > > > > > >> > This is insufficient for scenarios such as:
> > >> > > > > > >> > - Time-based retention (e.g., last X days).
> > >> > > > > > >> > - Combined rules (e.g., within 7 days AND ≤100 jobs).
> > >> > > > > > >> >
> > >> > > > > > >> >
> > >> > > > > > >> > To address these limitations, I’d like to start a
> > discussion
> > >> > on
> > >> > > > > > FLIP-490
> > >> > > > > > >> [2],
> > >> > > > > > >> > which proposes a more flexible job archive retention
> > >> mechanism
> > >> > > > that
> > >> > > > > > >> supports time-based, quantity-based, and composite
> > strategies
> > >> > (with
> > >> > > > > > AND/OR
> > >> > > > > > >> logic).
> > >> > > > > > >> >
> > >> > > > > > >> >
> > >> > > > > > >> > Looking forward to your feedback.
> > >> > > > > > >> >
> > >> > > > > > >> >
> > >> > > > > > >> > Best,
> > >> > > > > > >> > Yuepeng Pan
> > >> > > > > > >> >
> > >> > > > > > >> >
> > >> > > > > > >> > [1]
> > >> > > > > > >>
> > >> > > > > >
> > >> > > >
> > >> >
> > >>
> > https://github.com/apache/flink/blob/cae5fb4d3b6d9e0c10c3539ea4994fc1ad463b70/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServer.java#L241
> > >> > > > > > >> > [2]
> > >> > > > > > >>
> > >> > > > > >
> > >> > > >
> > >> >
> > >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857
> > >> > > > > > >>
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> 

Reply via email to