Hi, Becket.

Thank you for your comments.

> 1. What is the use case for ttlAndQuantity mode? It seems usually the

> desired behavior is ttlOrQuantity. If so, we can just add a ttl retention 
> config.




The ttlAndQuantity mode means that files in the remote directory can only be 
retained if their modification time is within the valid TTL

and the total number of files does not exceed the maximum limit.

One of the main purposes of this configuration item is to impose restrictions 
on the following situations:

- Within the TTL, the number of files grows too large, leading to excessive 
storage usage or too many files.

- Files remain within the file quantity threshold, but their modification times 
far exceed the TTL.




> 2. When there are multiple history server instances with different

> configurations, they are working independently today and may have conflict

> configs. This is an existing problem, but since we are adding more configs

> to the retention policy, it increases the chance of config conflicts. It

> would be good to have a clear user story for when there are multiple history 
> server instances.




This is indeed a good question. 

What do you think if we add a description like the following to the newly 
introduced configuration item section in the FLIP?

a. If there are multiple HistoryServer instances using the same 
historyserver.archive.fs.dir directory as the refresh directory,

 you should enable and configure this feature in only one HistoryServer 
instance to avoid errors caused by multiple instances simultaneously cleaning 
up remote files.

-OR-

b. If there are multiple HistoryServer instances using the same 
historyserver.archive.fs.dir directory as the refresh directory, 

you need to keep the value of this configuration consistent across them.




Regardless of whether option a or option b is chosen, it is necessary to 
enhance the corresponding exception handling when reading from and deleting 
remote files.




I’m really looking forward to hearing other suitable resolution candidates 
about the above items.

Please let me know your opinion.

Best,
Yuepeng Pan

















At 2025-08-19 00:37:26, "Becket Qin" <becket....@gmail.com> wrote:
>Thanks for the proposal, Yuepeng.
>
>I think this FLIP is mostly orthogonal to FLIP-505. This FLIP essentially
>tries to improve the retention policy of the actual archives, while
>FLIP-505 mainly focuses on caching. One connection between the two FLIPs
>might be when the actual archive expires and gets removed, it might make
>sense to also remove the local cache.
>
>A few question about this FlIP:
>
>1. What is the use case for ttlAndQuantity mode? It seems usually the
>desired behavior is ttlOrQuantity. If so, we can just add a ttl retention
>config.
>2. When there are multiple history server instances with different
>configurations, they are working independently today and may have conflict
>configs. This is an existing problem, but since we are adding more configs
>to the retention policy, it increases the chance of config conflicts. It
>would be good to have a clear user story for when there are multiple
>history server instances.
>
>Thanks,
>
>Jiangjie (Becket) Qin
>
>On Thu, Aug 14, 2025 at 1:56 PM Allison <achang5...@gmail.com> wrote:
>
>> Hi Yuepeng,
>>
>> Looks like this work can have some symbiosis with the change that I've
>> proposed here in FLIP-505. This addresses the question that Ryan asked
>> about whether or not remotely stored job archives will be impacted if the
>> retention is changed. Feel free to take a look at the FLIP as well as the
>> PR for FLIP-505. Looks like we have the opportunity to significantly
>> improve the History server with these two changes.
>>
>> FLIP-505:
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP+505%3A+Flink+History+Server+Scability+Improvements%2C+Remote+Data+Store+Fetch+and+Per+Job+Fetch
>> PR: https://github.com/apache/flink/pull/26878
>>
>> Best,
>> Allison
>>
>>
>> On Thu, Aug 14, 2025 at 9:51 AM Yuepeng Pan <panyuep...@apache.org> wrote:
>>
>> > Hi, Ryan van Huuksloot.
>> >
>> > > Might be worth stating that explicitly in the FLIP.
>> > Nice idea~ The sub-section added here[1] to clarify the item.
>> >
>> > Thanks a lot !
>> >
>> > [1]
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857#FLIP490:EnhancedJobHistoryRetentionPoliciesforHistoryServer
>> > -Thetimingtocheckwhethertargetfileshaveexceededtheretentionthresholds
>> >
>> > Best,
>> > Yuepeng Pan
>> >
>> > On 2025/08/14 16:27:39 Ryan van Huuksloot wrote:
>> > > That sounds like a good option.
>> > >
>> > > Might be worth stating that explicitly in the FLIP.
>> > >
>> > > No other questions from me - will be a nice extension!
>> > >
>> > > Ryan van Huuksloot
>> > > Staff Engineer, Infrastructure | Streaming Platform
>> > > [image: Shopify]
>> > > <
>> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
>> > >
>> > >
>> > >
>> > > On Thu, Aug 14, 2025 at 12:22 PM Yuepeng Pan <panyuep...@apache.org>
>> > wrote:
>> > >
>> > > > Hi, Hi, Ryan van Huuksloot.
>> > > >
>> > > > >Are you planning on having a thread to check for TTL? Or what is the
>> > plan
>> > > > >for TTL?
>> > > > >The quantity based would have a check when a new job is archived?
>> > > >
>> > > > Just like the implementation in the POC[1], if we continue following
>> > the
>> > > > process where
>> > > > HistoryServer#start method periodically invokes
>> > > > HistoryServerArchiveFetcher#fetchArchives
>> > > > based on 'historyserver.archive.fs.refresh-interval' to check
>> > > > whether target files should be retained, what do you think about it ?
>> > > > Of course, I'm very open to hearing about other potentially better
>> > > > implementation approaches.
>> > > > Please let me know what's your opinion.
>> > > > Thank you.
>> > > >
>> > > > [1] https://github.com/apache/flink/pull/26902
>> > > >
>> > > > Best,
>> > > > Yuepeng Pan
>> > > >
>> > > >
>> > > > On 2025/08/14 16:07:10 Ryan van Huuksloot wrote:
>> > > > > Thanks, sounds good.
>> > > > >
>> > > > > Are you planning on having a thread to check for TTL? Or what is
>> the
>> > plan
>> > > > > for TTL?
>> > > > > The quantity based would have a check when a new job is archived?
>> > > > >
>> > > > > Ryan van Huuksloot
>> > > > > Staff Engineer, Infrastructure | Streaming Platform
>> > > > > [image: Shopify]
>> > > > > <
>> > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, Aug 14, 2025 at 12:04 PM Yuepeng Pan <
>> panyuep...@apache.org>
>> > > > wrote:
>> > > > >
>> > > > > > Hi, Ryan van Huuksloot.
>> > > > > >
>> > > > > > Thank you very much for your reply. > Question: Is the History
>> > Server
>> > > > then
>> > > > > > going to delete the files stored? > (i.e. we use GCS, would it
>> > delete
>> > > > the
>> > > > > > files there as well?) > Or is this strictly what is shown in the
>> > UI?
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Yes, this feature introduced in the FLIP is a super-set of the
>> > original
>> > > > > > feature that is controlled by
>> > 'historyserver.archive.retained-jobs'.
>> > > > > >
>> > > > > > So if I understand correctly, after the new feature is
>> introduced,
>> > it
>> > > > > > would affect the retention period of remote distributed storage
>> > jobs
>> > > > > > history files as well, not only for what is shown in the UI.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Best,
>> > > > > > Yuepeng Pan
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > At 2025-08-14 23:34:54, "Ryan van Huuksloot"
>> > > > > > <ryan.vanhuuksl...@shopify.com.INVALID> wrote:
>> > > > > > >I took a look. Overall it would be nice to have more ways to
>> > > > configure the
>> > > > > > >History Server.
>> > > > > > >
>> > > > > > >Question: Is the History Server then going to delete the files
>> > stored?
>> > > > > > >(i.e. we use GCS, would it delete the files there as well?)
>> > > > > > >Or is this strictly what is shown in the UI?
>> > > > > > >
>> > > > > > >Ryan van Huuksloot
>> > > > > > >Staff Engineer, Infrastructure | Streaming Platform
>> > > > > > >[image: Shopify]
>> > > > > > ><
>> > > >
>> > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
>> > > > > > >
>> > > > > > >
>> > > > > > >On Thu, Aug 14, 2025 at 11:17 AM Yuepeng Pan <
>> > panyuep...@apache.org>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > >> Bumping this thread. Thanks!
>> > > > > > >>
>> > > > > > >> Best,
>> > > > > > >> Yuepeng Pan
>> > > > > > >>
>> > > > > > >> On 2025/08/11 03:49:27 Yuepeng Pan wrote:
>> > > > > > >> > Hi community,
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >> > Currently, HistoryServer supports only a quantity-based job
>> > > > archive
>> > > > > > >> retention policy [1].
>> > > > > > >> > This is insufficient for scenarios such as:
>> > > > > > >> > - Time-based retention (e.g., last X days).
>> > > > > > >> > - Combined rules (e.g., within 7 days AND ≤100 jobs).
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >> > To address these limitations, I’d like to start a discussion
>> > on
>> > > > > > FLIP-490
>> > > > > > >> [2],
>> > > > > > >> > which proposes a more flexible job archive retention
>> mechanism
>> > > > that
>> > > > > > >> supports time-based, quantity-based, and composite strategies
>> > (with
>> > > > > > AND/OR
>> > > > > > >> logic).
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >> > Looking forward to your feedback.
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >> > Best,
>> > > > > > >> > Yuepeng Pan
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >> > [1]
>> > > > > > >>
>> > > > > >
>> > > >
>> >
>> https://github.com/apache/flink/blob/cae5fb4d3b6d9e0c10c3539ea4994fc1ad463b70/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServer.java#L241
>> > > > > > >> > [2]
>> > > > > > >>
>> > > > > >
>> > > >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>

Reply via email to