Would anyone like to discuss this FLIP? 
I'd appreciate your feedback and suggestions.

Best,
Yuepeng Pan

On 2025/08/20 07:13:44 Yuepeng Pan wrote:
> Hi, Becket.
> 
> Thank you for the clarification.
> Please let me have a try on revisiting these two questions with a explanation:
> 
> > I meant to ask what is the use case for 
> > ttlOrQuantity mode? Is it sufficient to delete the job archive when either 
> > TTL or quantity is reached if both are set?
> 
> As the configuration key 'historyserver.archive.retained-jobs.mode' literally 
> suggests, 
> this policy governs the retention mode for archived historical jobs. 
> When set to 'ttlOrQuantity', a target file will be retained if either of the 
> following conditions is met (in other words, deletion occurs only if both 
> conditions are unsatisfied):
> 
> - The file count is within the maximum retention threshold.
> - The file remains within the TTL (Time to Live) period.
> 
> >Regarding the case when there are multiple history server instances, if we 
> >don't enforce a behavior, users can go with either a) and b), and it would 
> >just be up to the user to choose. We need to document the behavior properly.
> 
> Thanks for the comment. And I added the related content as note/comment[1] of 
> the new configuration  'historyserver.archive.retained-jobs.mode' .
> In the subsequent implementation phase, this part of the description will be 
> refined and added to the corresponding configuration documentation.
> 
> Best,
> Yuepeng Pan.
> 
> [1] 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857#FLIP490:EnhancedJobHistoryRetentionPoliciesforHistoryServer-PublicInterfaces
> 
> 
> 
> On 2025/08/20 04:55:24 Becket Qin wrote:
> > Hi Yuepeng,
> > 
> > Sorry for the confusion. I meant to ask what is the use case for
> > ttlOrQuantity mode? Is it sufficient to delete the job archive when either
> > TTL or quantity is reached if both are set?
> > 
> > Regarding the case when there are multiple history server instances, if we
> > don't enforce a behavior, users can go with either a) and b), and it would
> > just be up to the user to choose. We need to document the behavior properly.
> > 
> > Thanks,
> > 
> > Jiangjie (Becket) Qin
> > 
> > 
> > On Mon, Aug 18, 2025 at 10:28 PM Yuepeng Pan <panyuep...@apache.org> wrote:
> > 
> > > Hi, Becket.
> > >
> > > Thank you for your comments.
> > >
> > > > 1. What is the use case for ttlAndQuantity mode? It seems usually the
> > >
> > > > desired behavior is ttlOrQuantity. If so, we can just add a ttl
> > > retention config.
> > >
> > >
> > >
> > >
> > > The ttlAndQuantity mode means that files in the remote directory can only
> > > be retained if their modification time is within the valid TTL
> > >
> > > and the total number of files does not exceed the maximum limit.
> > >
> > > One of the main purposes of this configuration item is to impose
> > > restrictions on the following situations:
> > >
> > > - Within the TTL, the number of files grows too large, leading to
> > > excessive storage usage or too many files.
> > >
> > > - Files remain within the file quantity threshold, but their modification
> > > times far exceed the TTL.
> > >
> > >
> > >
> > >
> > > > 2. When there are multiple history server instances with different
> > >
> > > > configurations, they are working independently today and may have
> > > conflict
> > >
> > > > configs. This is an existing problem, but since we are adding more
> > > configs
> > >
> > > > to the retention policy, it increases the chance of config conflicts. It
> > >
> > > > would be good to have a clear user story for when there are multiple
> > > history server instances.
> > >
> > >
> > >
> > >
> > > This is indeed a good question.
> > >
> > > What do you think if we add a description like the following to the newly
> > > introduced configuration item section in the FLIP?
> > >
> > > a. If there are multiple HistoryServer instances using the same
> > > historyserver.archive.fs.dir directory as the refresh directory,
> > >
> > >  you should enable and configure this feature in only one HistoryServer
> > > instance to avoid errors caused by multiple instances simultaneously
> > > cleaning up remote files.
> > >
> > > -OR-
> > >
> > > b. If there are multiple HistoryServer instances using the same
> > > historyserver.archive.fs.dir directory as the refresh directory,
> > >
> > > you need to keep the value of this configuration consistent across them.
> > >
> > >
> > >
> > >
> > > Regardless of whether option a or option b is chosen, it is necessary to
> > > enhance the corresponding exception handling when reading from and 
> > > deleting
> > > remote files.
> > >
> > >
> > >
> > >
> > > I’m really looking forward to hearing other suitable resolution candidates
> > > about the above items.
> > >
> > > Please let me know your opinion.
> > >
> > > Best,
> > > Yuepeng Pan
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > At 2025-08-19 00:37:26, "Becket Qin" <becket....@gmail.com> wrote:
> > > >Thanks for the proposal, Yuepeng.
> > > >
> > > >I think this FLIP is mostly orthogonal to FLIP-505. This FLIP essentially
> > > >tries to improve the retention policy of the actual archives, while
> > > >FLIP-505 mainly focuses on caching. One connection between the two FLIPs
> > > >might be when the actual archive expires and gets removed, it might make
> > > >sense to also remove the local cache.
> > > >
> > > >A few question about this FlIP:
> > > >
> > > >1. What is the use case for ttlAndQuantity mode? It seems usually the
> > > >desired behavior is ttlOrQuantity. If so, we can just add a ttl retention
> > > >config.
> > > >2. When there are multiple history server instances with different
> > > >configurations, they are working independently today and may have 
> > > >conflict
> > > >configs. This is an existing problem, but since we are adding more 
> > > >configs
> > > >to the retention policy, it increases the chance of config conflicts. It
> > > >would be good to have a clear user story for when there are multiple
> > > >history server instances.
> > > >
> > > >Thanks,
> > > >
> > > >Jiangjie (Becket) Qin
> > > >
> > > >On Thu, Aug 14, 2025 at 1:56 PM Allison <achang5...@gmail.com> wrote:
> > > >
> > > >> Hi Yuepeng,
> > > >>
> > > >> Looks like this work can have some symbiosis with the change that I've
> > > >> proposed here in FLIP-505. This addresses the question that Ryan asked
> > > >> about whether or not remotely stored job archives will be impacted if
> > > the
> > > >> retention is changed. Feel free to take a look at the FLIP as well as
> > > the
> > > >> PR for FLIP-505. Looks like we have the opportunity to significantly
> > > >> improve the History server with these two changes.
> > > >>
> > > >> FLIP-505:
> > > >>
> > > >>
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+505%3A+Flink+History+Server+Scability+Improvements%2C+Remote+Data+Store+Fetch+and+Per+Job+Fetch
> > > >> PR: https://github.com/apache/flink/pull/26878
> > > >>
> > > >> Best,
> > > >> Allison
> > > >>
> > > >>
> > > >> On Thu, Aug 14, 2025 at 9:51 AM Yuepeng Pan <panyuep...@apache.org>
> > > wrote:
> > > >>
> > > >> > Hi, Ryan van Huuksloot.
> > > >> >
> > > >> > > Might be worth stating that explicitly in the FLIP.
> > > >> > Nice idea~ The sub-section added here[1] to clarify the item.
> > > >> >
> > > >> > Thanks a lot !
> > > >> >
> > > >> > [1]
> > > >> >
> > > >>
> > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857#FLIP490:EnhancedJobHistoryRetentionPoliciesforHistoryServer
> > > >> > -Thetimingtocheckwhethertargetfileshaveexceededtheretentionthresholds
> > > >> >
> > > >> > Best,
> > > >> > Yuepeng Pan
> > > >> >
> > > >> > On 2025/08/14 16:27:39 Ryan van Huuksloot wrote:
> > > >> > > That sounds like a good option.
> > > >> > >
> > > >> > > Might be worth stating that explicitly in the FLIP.
> > > >> > >
> > > >> > > No other questions from me - will be a nice extension!
> > > >> > >
> > > >> > > Ryan van Huuksloot
> > > >> > > Staff Engineer, Infrastructure | Streaming Platform
> > > >> > > [image: Shopify]
> > > >> > > <
> > > >> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Thu, Aug 14, 2025 at 12:22 PM Yuepeng Pan <panyuep...@apache.org
> > > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Hi, Hi, Ryan van Huuksloot.
> > > >> > > >
> > > >> > > > >Are you planning on having a thread to check for TTL? Or what is
> > > the
> > > >> > plan
> > > >> > > > >for TTL?
> > > >> > > > >The quantity based would have a check when a new job is 
> > > >> > > > >archived?
> > > >> > > >
> > > >> > > > Just like the implementation in the POC[1], if we continue
> > > following
> > > >> > the
> > > >> > > > process where
> > > >> > > > HistoryServer#start method periodically invokes
> > > >> > > > HistoryServerArchiveFetcher#fetchArchives
> > > >> > > > based on 'historyserver.archive.fs.refresh-interval' to check
> > > >> > > > whether target files should be retained, what do you think about
> > > it ?
> > > >> > > > Of course, I'm very open to hearing about other potentially 
> > > >> > > > better
> > > >> > > > implementation approaches.
> > > >> > > > Please let me know what's your opinion.
> > > >> > > > Thank you.
> > > >> > > >
> > > >> > > > [1] https://github.com/apache/flink/pull/26902
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Yuepeng Pan
> > > >> > > >
> > > >> > > >
> > > >> > > > On 2025/08/14 16:07:10 Ryan van Huuksloot wrote:
> > > >> > > > > Thanks, sounds good.
> > > >> > > > >
> > > >> > > > > Are you planning on having a thread to check for TTL? Or what 
> > > >> > > > > is
> > > >> the
> > > >> > plan
> > > >> > > > > for TTL?
> > > >> > > > > The quantity based would have a check when a new job is
> > > archived?
> > > >> > > > >
> > > >> > > > > Ryan van Huuksloot
> > > >> > > > > Staff Engineer, Infrastructure | Streaming Platform
> > > >> > > > > [image: Shopify]
> > > >> > > > > <
> > > >> >
> > > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Thu, Aug 14, 2025 at 12:04 PM Yuepeng Pan <
> > > >> panyuep...@apache.org>
> > > >> > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hi, Ryan van Huuksloot.
> > > >> > > > > >
> > > >> > > > > > Thank you very much for your reply. > Question: Is the 
> > > >> > > > > > History
> > > >> > Server
> > > >> > > > then
> > > >> > > > > > going to delete the files stored? > (i.e. we use GCS, would 
> > > >> > > > > > it
> > > >> > delete
> > > >> > > > the
> > > >> > > > > > files there as well?) > Or is this strictly what is shown in
> > > the
> > > >> > UI?
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > Yes, this feature introduced in the FLIP is a super-set of 
> > > >> > > > > > the
> > > >> > original
> > > >> > > > > > feature that is controlled by
> > > >> > 'historyserver.archive.retained-jobs'.
> > > >> > > > > >
> > > >> > > > > > So if I understand correctly, after the new feature is
> > > >> introduced,
> > > >> > it
> > > >> > > > > > would affect the retention period of remote distributed
> > > storage
> > > >> > jobs
> > > >> > > > > > history files as well, not only for what is shown in the UI.
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > Best,
> > > >> > > > > > Yuepeng Pan
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > At 2025-08-14 23:34:54, "Ryan van Huuksloot"
> > > >> > > > > > <ryan.vanhuuksl...@shopify.com.INVALID> wrote:
> > > >> > > > > > >I took a look. Overall it would be nice to have more ways to
> > > >> > > > configure the
> > > >> > > > > > >History Server.
> > > >> > > > > > >
> > > >> > > > > > >Question: Is the History Server then going to delete the
> > > files
> > > >> > stored?
> > > >> > > > > > >(i.e. we use GCS, would it delete the files there as well?)
> > > >> > > > > > >Or is this strictly what is shown in the UI?
> > > >> > > > > > >
> > > >> > > > > > >Ryan van Huuksloot
> > > >> > > > > > >Staff Engineer, Infrastructure | Streaming Platform
> > > >> > > > > > >[image: Shopify]
> > > >> > > > > > ><
> > > >> > > >
> > > >> >
> > > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >On Thu, Aug 14, 2025 at 11:17 AM Yuepeng Pan <
> > > >> > panyuep...@apache.org>
> > > >> > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > >> Bumping this thread. Thanks!
> > > >> > > > > > >>
> > > >> > > > > > >> Best,
> > > >> > > > > > >> Yuepeng Pan
> > > >> > > > > > >>
> > > >> > > > > > >> On 2025/08/11 03:49:27 Yuepeng Pan wrote:
> > > >> > > > > > >> > Hi community,
> > > >> > > > > > >> >
> > > >> > > > > > >> >
> > > >> > > > > > >> > Currently, HistoryServer supports only a quantity-based
> > > job
> > > >> > > > archive
> > > >> > > > > > >> retention policy [1].
> > > >> > > > > > >> > This is insufficient for scenarios such as:
> > > >> > > > > > >> > - Time-based retention (e.g., last X days).
> > > >> > > > > > >> > - Combined rules (e.g., within 7 days AND ≤100 jobs).
> > > >> > > > > > >> >
> > > >> > > > > > >> >
> > > >> > > > > > >> > To address these limitations, I’d like to start a
> > > discussion
> > > >> > on
> > > >> > > > > > FLIP-490
> > > >> > > > > > >> [2],
> > > >> > > > > > >> > which proposes a more flexible job archive retention
> > > >> mechanism
> > > >> > > > that
> > > >> > > > > > >> supports time-based, quantity-based, and composite
> > > strategies
> > > >> > (with
> > > >> > > > > > AND/OR
> > > >> > > > > > >> logic).
> > > >> > > > > > >> >
> > > >> > > > > > >> >
> > > >> > > > > > >> > Looking forward to your feedback.
> > > >> > > > > > >> >
> > > >> > > > > > >> >
> > > >> > > > > > >> > Best,
> > > >> > > > > > >> > Yuepeng Pan
> > > >> > > > > > >> >
> > > >> > > > > > >> >
> > > >> > > > > > >> > [1]
> > > >> > > > > > >>
> > > >> > > > > >
> > > >> > > >
> > > >> >
> > > >>
> > > https://github.com/apache/flink/blob/cae5fb4d3b6d9e0c10c3539ea4994fc1ad463b70/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServer.java#L241
> > > >> > > > > > >> > [2]
> > > >> > > > > > >>
> > > >> > > > > >
> > > >> > > >
> > > >> >
> > > >>
> > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857
> > > >> > > > > > >>
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> > 
> 

Reply via email to