Hi Yuepeng,

I think it would be better to keep the configurations straight forward
instead of conditional if possible.
How about just adding one config of TTL. We will remove a job archive
either when its TTL has passed, or when the retained job count has been
reached and the job is the earliest job.

Thanks,

Jiangjie (Becket) Qin




On Fri, Aug 22, 2025 at 2:55 AM Yuepeng Pan <panyuep...@apache.org> wrote:

> Would anyone like to discuss this FLIP?
> I'd appreciate your feedback and suggestions.
>
> Best,
> Yuepeng Pan
>
> On 2025/08/20 07:13:44 Yuepeng Pan wrote:
> > Hi, Becket.
> >
> > Thank you for the clarification.
> > Please let me have a try on revisiting these two questions with a
> explanation:
> >
> > > I meant to ask what is the use case for
> > > ttlOrQuantity mode? Is it sufficient to delete the job archive when
> either
> > > TTL or quantity is reached if both are set?
> >
> > As the configuration key 'historyserver.archive.retained-jobs.mode'
> literally suggests,
> > this policy governs the retention mode for archived historical jobs.
> > When set to 'ttlOrQuantity', a target file will be retained if either of
> the following conditions is met (in other words, deletion occurs only if
> both conditions are unsatisfied):
> >
> > - The file count is within the maximum retention threshold.
> > - The file remains within the TTL (Time to Live) period.
> >
> > >Regarding the case when there are multiple history server instances, if
> we
> > >don't enforce a behavior, users can go with either a) and b), and it
> would
> > >just be up to the user to choose. We need to document the behavior
> properly.
> >
> > Thanks for the comment. And I added the related content as
> note/comment[1] of the new configuration
> 'historyserver.archive.retained-jobs.mode' .
> > In the subsequent implementation phase, this part of the description
> will be refined and added to the corresponding configuration documentation.
> >
> > Best,
> > Yuepeng Pan.
> >
> > [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857#FLIP490:EnhancedJobHistoryRetentionPoliciesforHistoryServer-PublicInterfaces
> >
> >
> >
> > On 2025/08/20 04:55:24 Becket Qin wrote:
> > > Hi Yuepeng,
> > >
> > > Sorry for the confusion. I meant to ask what is the use case for
> > > ttlOrQuantity mode? Is it sufficient to delete the job archive when
> either
> > > TTL or quantity is reached if both are set?
> > >
> > > Regarding the case when there are multiple history server instances,
> if we
> > > don't enforce a behavior, users can go with either a) and b), and it
> would
> > > just be up to the user to choose. We need to document the behavior
> properly.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > > On Mon, Aug 18, 2025 at 10:28 PM Yuepeng Pan <panyuep...@apache.org>
> wrote:
> > >
> > > > Hi, Becket.
> > > >
> > > > Thank you for your comments.
> > > >
> > > > > 1. What is the use case for ttlAndQuantity mode? It seems usually
> the
> > > >
> > > > > desired behavior is ttlOrQuantity. If so, we can just add a ttl
> > > > retention config.
> > > >
> > > >
> > > >
> > > >
> > > > The ttlAndQuantity mode means that files in the remote directory can
> only
> > > > be retained if their modification time is within the valid TTL
> > > >
> > > > and the total number of files does not exceed the maximum limit.
> > > >
> > > > One of the main purposes of this configuration item is to impose
> > > > restrictions on the following situations:
> > > >
> > > > - Within the TTL, the number of files grows too large, leading to
> > > > excessive storage usage or too many files.
> > > >
> > > > - Files remain within the file quantity threshold, but their
> modification
> > > > times far exceed the TTL.
> > > >
> > > >
> > > >
> > > >
> > > > > 2. When there are multiple history server instances with different
> > > >
> > > > > configurations, they are working independently today and may have
> > > > conflict
> > > >
> > > > > configs. This is an existing problem, but since we are adding more
> > > > configs
> > > >
> > > > > to the retention policy, it increases the chance of config
> conflicts. It
> > > >
> > > > > would be good to have a clear user story for when there are
> multiple
> > > > history server instances.
> > > >
> > > >
> > > >
> > > >
> > > > This is indeed a good question.
> > > >
> > > > What do you think if we add a description like the following to the
> newly
> > > > introduced configuration item section in the FLIP?
> > > >
> > > > a. If there are multiple HistoryServer instances using the same
> > > > historyserver.archive.fs.dir directory as the refresh directory,
> > > >
> > > >  you should enable and configure this feature in only one
> HistoryServer
> > > > instance to avoid errors caused by multiple instances simultaneously
> > > > cleaning up remote files.
> > > >
> > > > -OR-
> > > >
> > > > b. If there are multiple HistoryServer instances using the same
> > > > historyserver.archive.fs.dir directory as the refresh directory,
> > > >
> > > > you need to keep the value of this configuration consistent across
> them.
> > > >
> > > >
> > > >
> > > >
> > > > Regardless of whether option a or option b is chosen, it is
> necessary to
> > > > enhance the corresponding exception handling when reading from and
> deleting
> > > > remote files.
> > > >
> > > >
> > > >
> > > >
> > > > I’m really looking forward to hearing other suitable resolution
> candidates
> > > > about the above items.
> > > >
> > > > Please let me know your opinion.
> > > >
> > > > Best,
> > > > Yuepeng Pan
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > At 2025-08-19 00:37:26, "Becket Qin" <becket....@gmail.com> wrote:
> > > > >Thanks for the proposal, Yuepeng.
> > > > >
> > > > >I think this FLIP is mostly orthogonal to FLIP-505. This FLIP
> essentially
> > > > >tries to improve the retention policy of the actual archives, while
> > > > >FLIP-505 mainly focuses on caching. One connection between the two
> FLIPs
> > > > >might be when the actual archive expires and gets removed, it might
> make
> > > > >sense to also remove the local cache.
> > > > >
> > > > >A few question about this FlIP:
> > > > >
> > > > >1. What is the use case for ttlAndQuantity mode? It seems usually
> the
> > > > >desired behavior is ttlOrQuantity. If so, we can just add a ttl
> retention
> > > > >config.
> > > > >2. When there are multiple history server instances with different
> > > > >configurations, they are working independently today and may have
> conflict
> > > > >configs. This is an existing problem, but since we are adding more
> configs
> > > > >to the retention policy, it increases the chance of config
> conflicts. It
> > > > >would be good to have a clear user story for when there are multiple
> > > > >history server instances.
> > > > >
> > > > >Thanks,
> > > > >
> > > > >Jiangjie (Becket) Qin
> > > > >
> > > > >On Thu, Aug 14, 2025 at 1:56 PM Allison <achang5...@gmail.com>
> wrote:
> > > > >
> > > > >> Hi Yuepeng,
> > > > >>
> > > > >> Looks like this work can have some symbiosis with the change that
> I've
> > > > >> proposed here in FLIP-505. This addresses the question that Ryan
> asked
> > > > >> about whether or not remotely stored job archives will be
> impacted if
> > > > the
> > > > >> retention is changed. Feel free to take a look at the FLIP as
> well as
> > > > the
> > > > >> PR for FLIP-505. Looks like we have the opportunity to
> significantly
> > > > >> improve the History server with these two changes.
> > > > >>
> > > > >> FLIP-505:
> > > > >>
> > > > >>
> > > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+505%3A+Flink+History+Server+Scability+Improvements%2C+Remote+Data+Store+Fetch+and+Per+Job+Fetch
> > > > >> PR: https://github.com/apache/flink/pull/26878
> > > > >>
> > > > >> Best,
> > > > >> Allison
> > > > >>
> > > > >>
> > > > >> On Thu, Aug 14, 2025 at 9:51 AM Yuepeng Pan <
> panyuep...@apache.org>
> > > > wrote:
> > > > >>
> > > > >> > Hi, Ryan van Huuksloot.
> > > > >> >
> > > > >> > > Might be worth stating that explicitly in the FLIP.
> > > > >> > Nice idea~ The sub-section added here[1] to clarify the item.
> > > > >> >
> > > > >> > Thanks a lot !
> > > > >> >
> > > > >> > [1]
> > > > >> >
> > > > >>
> > > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857#FLIP490:EnhancedJobHistoryRetentionPoliciesforHistoryServer
> > > > >> >
> -Thetimingtocheckwhethertargetfileshaveexceededtheretentionthresholds
> > > > >> >
> > > > >> > Best,
> > > > >> > Yuepeng Pan
> > > > >> >
> > > > >> > On 2025/08/14 16:27:39 Ryan van Huuksloot wrote:
> > > > >> > > That sounds like a good option.
> > > > >> > >
> > > > >> > > Might be worth stating that explicitly in the FLIP.
> > > > >> > >
> > > > >> > > No other questions from me - will be a nice extension!
> > > > >> > >
> > > > >> > > Ryan van Huuksloot
> > > > >> > > Staff Engineer, Infrastructure | Streaming Platform
> > > > >> > > [image: Shopify]
> > > > >> > > <
> > > > >>
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > On Thu, Aug 14, 2025 at 12:22 PM Yuepeng Pan <
> panyuep...@apache.org
> > > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Hi, Hi, Ryan van Huuksloot.
> > > > >> > > >
> > > > >> > > > >Are you planning on having a thread to check for TTL? Or
> what is
> > > > the
> > > > >> > plan
> > > > >> > > > >for TTL?
> > > > >> > > > >The quantity based would have a check when a new job is
> archived?
> > > > >> > > >
> > > > >> > > > Just like the implementation in the POC[1], if we continue
> > > > following
> > > > >> > the
> > > > >> > > > process where
> > > > >> > > > HistoryServer#start method periodically invokes
> > > > >> > > > HistoryServerArchiveFetcher#fetchArchives
> > > > >> > > > based on 'historyserver.archive.fs.refresh-interval' to
> check
> > > > >> > > > whether target files should be retained, what do you think
> about
> > > > it ?
> > > > >> > > > Of course, I'm very open to hearing about other potentially
> better
> > > > >> > > > implementation approaches.
> > > > >> > > > Please let me know what's your opinion.
> > > > >> > > > Thank you.
> > > > >> > > >
> > > > >> > > > [1] https://github.com/apache/flink/pull/26902
> > > > >> > > >
> > > > >> > > > Best,
> > > > >> > > > Yuepeng Pan
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On 2025/08/14 16:07:10 Ryan van Huuksloot wrote:
> > > > >> > > > > Thanks, sounds good.
> > > > >> > > > >
> > > > >> > > > > Are you planning on having a thread to check for TTL? Or
> what is
> > > > >> the
> > > > >> > plan
> > > > >> > > > > for TTL?
> > > > >> > > > > The quantity based would have a check when a new job is
> > > > archived?
> > > > >> > > > >
> > > > >> > > > > Ryan van Huuksloot
> > > > >> > > > > Staff Engineer, Infrastructure | Streaming Platform
> > > > >> > > > > [image: Shopify]
> > > > >> > > > > <
> > > > >> >
> > > >
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Thu, Aug 14, 2025 at 12:04 PM Yuepeng Pan <
> > > > >> panyuep...@apache.org>
> > > > >> > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi, Ryan van Huuksloot.
> > > > >> > > > > >
> > > > >> > > > > > Thank you very much for your reply. > Question: Is the
> History
> > > > >> > Server
> > > > >> > > > then
> > > > >> > > > > > going to delete the files stored? > (i.e. we use GCS,
> would it
> > > > >> > delete
> > > > >> > > > the
> > > > >> > > > > > files there as well?) > Or is this strictly what is
> shown in
> > > > the
> > > > >> > UI?
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > Yes, this feature introduced in the FLIP is a super-set
> of the
> > > > >> > original
> > > > >> > > > > > feature that is controlled by
> > > > >> > 'historyserver.archive.retained-jobs'.
> > > > >> > > > > >
> > > > >> > > > > > So if I understand correctly, after the new feature is
> > > > >> introduced,
> > > > >> > it
> > > > >> > > > > > would affect the retention period of remote distributed
> > > > storage
> > > > >> > jobs
> > > > >> > > > > > history files as well, not only for what is shown in
> the UI.
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > Best,
> > > > >> > > > > > Yuepeng Pan
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > At 2025-08-14 23:34:54, "Ryan van Huuksloot"
> > > > >> > > > > > <ryan.vanhuuksl...@shopify.com.INVALID> wrote:
> > > > >> > > > > > >I took a look. Overall it would be nice to have more
> ways to
> > > > >> > > > configure the
> > > > >> > > > > > >History Server.
> > > > >> > > > > > >
> > > > >> > > > > > >Question: Is the History Server then going to delete
> the
> > > > files
> > > > >> > stored?
> > > > >> > > > > > >(i.e. we use GCS, would it delete the files there as
> well?)
> > > > >> > > > > > >Or is this strictly what is shown in the UI?
> > > > >> > > > > > >
> > > > >> > > > > > >Ryan van Huuksloot
> > > > >> > > > > > >Staff Engineer, Infrastructure | Streaming Platform
> > > > >> > > > > > >[image: Shopify]
> > > > >> > > > > > ><
> > > > >> > > >
> > > > >> >
> > > >
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >On Thu, Aug 14, 2025 at 11:17 AM Yuepeng Pan <
> > > > >> > panyuep...@apache.org>
> > > > >> > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > >> Bumping this thread. Thanks!
> > > > >> > > > > > >>
> > > > >> > > > > > >> Best,
> > > > >> > > > > > >> Yuepeng Pan
> > > > >> > > > > > >>
> > > > >> > > > > > >> On 2025/08/11 03:49:27 Yuepeng Pan wrote:
> > > > >> > > > > > >> > Hi community,
> > > > >> > > > > > >> >
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > Currently, HistoryServer supports only a
> quantity-based
> > > > job
> > > > >> > > > archive
> > > > >> > > > > > >> retention policy [1].
> > > > >> > > > > > >> > This is insufficient for scenarios such as:
> > > > >> > > > > > >> > - Time-based retention (e.g., last X days).
> > > > >> > > > > > >> > - Combined rules (e.g., within 7 days AND ≤100
> jobs).
> > > > >> > > > > > >> >
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > To address these limitations, I’d like to start a
> > > > discussion
> > > > >> > on
> > > > >> > > > > > FLIP-490
> > > > >> > > > > > >> [2],
> > > > >> > > > > > >> > which proposes a more flexible job archive
> retention
> > > > >> mechanism
> > > > >> > > > that
> > > > >> > > > > > >> supports time-based, quantity-based, and composite
> > > > strategies
> > > > >> > (with
> > > > >> > > > > > AND/OR
> > > > >> > > > > > >> logic).
> > > > >> > > > > > >> >
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > Looking forward to your feedback.
> > > > >> > > > > > >> >
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > Best,
> > > > >> > > > > > >> > Yuepeng Pan
> > > > >> > > > > > >> >
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > [1]
> > > > >> > > > > > >>
> > > > >> > > > > >
> > > > >> > > >
> > > > >> >
> > > > >>
> > > >
> https://github.com/apache/flink/blob/cae5fb4d3b6d9e0c10c3539ea4994fc1ad463b70/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServer.java#L241
> > > > >> > > > > > >> > [2]
> > > > >> > > > > > >>
> > > > >> > > > > >
> > > > >> > > >
> > > > >> >
> > > > >>
> > > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857
> > > > >> > > > > > >>
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
>

Reply via email to