Hi, community.

FYI,
To ensure that the rescale history stored and recorded in FLIP-495 can be 
accessed by external systems/users, we'd plan to release the FLIP-495 
functionality together with at least two sub-tasks[1][2] of FLIP-487[3].

These two sub-tasks will respectively support:
- retrieving all current rescale history records
- retrieving the detailed record of a specific rescale by its rescale UUID

[1] https://issues.apache.org/jira/browse/FLINK-38894
[2] https://issues.apache.org/jira/browse/FLINK-38895
[3] https://issues.apache.org/jira/browse/FLINK-22258

Best, 
Yuepeng Pan

On 2025/09/18 04:03:22 Yuepeng Pan wrote:
> Hi, community.
> 
> FYI:
> Since the design work of the query interface of rescale history was separated 
> into FLIP-487[1] during the discussion, we have therefore changed the title 
> of the FLIP to:
> 
> FLIP-495: Support AdaptiveScheduler record and store the rescale history.
> 
> [1] https://cwiki.apache.org/confluence/x/vZCMEw
> 
> Best regards,
> Yuepeng Pan
> 
> On 2025/08/19 09:13:22 Yuepeng Pan wrote:
> > Bumping this thread kindly. Thanks!
> > 
> > Best,
> > Yuepeng Pan 
> > 
> > 
> > 
> > 
> > At 2025-08-13 14:52:26, "Yuepeng Pan" <[email protected]> wrote:
> > 
> > Hi, Matthias,
> > Thank you  very much for your comments!
> > I have carefully read your reply and made some changes in the hope of 
> > making improvements.
> > Please help take a look.
> > 
> > For your comments:
> > 
> > > 1. You mention a few options for when it comes to storing the data which 
> > > is
> > > good. The FLIP doesn't point out, though, what option you're going to go
> > > for as part of this FLIP (as far as I can see). It would be good to only
> > > outline the option to go for in the FLIP and list the other options as
> > > rejected alternatives (with the pro's and con's). I think it make sense to
> > > go for option 3 (i.e. following what's done for the 
> > > ExecutionGraphInfoStore
> > > for now). The other options can be considered as a follow-up.
> > 
> > This is very meaningful. Based on this comment, I have kept option 3 in its 
> > original place and moved the other candidate options to [1].
> > 
> > > 2. About the terminal states of a rescaling (i.e. IGNORED, FAILED,
> > > COMPLETED): Can we we clarify in the FLIP under what conditions the
> > > rescaling transitions into each of the three terminal states?
> > 
> > Yes, this is a reasonable request for understanding and explaining the 
> > logic of transitions to terminated states.
> > A new subsection [2] has been added to address this.
> > 
> > > 3. The section "The information to record in a rescale event" could be
> > > restructured in four sections (to remove redundancy):
> > > a) The IDs (Rescale
> > > ID, resourceRequirementsEpochID, 
> > > subRescaleIdOfResourceRequirementsEpochID):
> > > What about making these names easier to read: GlobalRescaleID, 
> > > RescaleUUID,
> > > RescaleAttemptId)
> > > b) Per-vertex data which includes: JobVertexID, JobVertexName,
> > > SlotSharingGroupId, the different parallelisms (pre-rescale, sufficient,
> > > desired, post-rescale)
> > > c) The SlotSharingGroup information: SlotSharingGroupId, name,
> > > ResourceProfile
> > > d) Other information: Timestamps of state transitions, etc. as laid out in
> > > the FLIP already
> > 
> > That makes sense to me. Please check [3] for the latest updates in this 
> > part.
> > 
> > > 4. The FLIP doesn't explain how the data is passed through the
> > > AdaptiveScheduler states. We should be handling some kind of
> > > RescaleSnapshot that is passed through the different states and updated 
> > > and
> > > its final state is stored somewhere within AdaptiveScheduler in the end, I
> > > guess. Can we clarify that in the FLIP?
> > 
> > Indeed — this was missing in the original FLIP. To address this, I have 
> > added [4], which focuses on describing how a Rescale is represented,
> > and how we can quickly pass and maintain the Rescale history.
> > 
> > > 5. You mention the config parameters for the cache in the public interface
> > > section. But there's no mentioning of any caching and how that is used
> > > within the FLIP.
> > 
> > Sorry for the rough description in the previous version.
> > Since this part belongs to the REST API acceleration mechanism for 
> > rescaling, and Option 6 seems reasonable to me,
> > I plan to add it to FLIP-487 once the design of FLIP-495 has reached 
> > consensus.
> > Of course, if needed, I'd be happy to clarify the usage and purpose of this 
> > parameter in the current email thread.
> > 
> > > 6. The REST endpoint is probably better suited in FLIP-487. FLIP-495 
> > > should
> > > be about the actual implementation details and how the data is stored
> > > internally whereas FLIP-487 is about exposing the information to the
> > > outside through the REST API and the Flink UI. That would be a way to
> > > decrease the scope of FLIP-495. WDYT?
> > 
> > That sounds nice to me. Therefore, I have moved all REST API–related 
> > changes to FLIP-487. 
> > BTW, to avoid repetitive changes in FLIP-487, I'll start organizing 
> > FLIP-487 after FLIP-495 has been finalized.
> > 
> > Looking forward to your next review!
> > 
> > [1]https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=334760525#FLIP495:SupportAdaptiveSchedulerrecordandquerytherescalehistory-Aboutrescaleeventsstorage.1
> > [2]https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=334760525#FLIP495:SupportAdaptiveSchedulerrecordandquerytherescalehistory-ThemainscenarioswhereRescalestatusswitchestoterminated
> > [3]https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=334760525#FLIP495:SupportAdaptiveSchedulerrecordandquerytherescalehistory-Theinformationtorecordinarescaleevent
> > [4]https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=334760525#FLIP495:SupportAdaptiveSchedulerrecordandquerytherescalehistory-InternalInterfaces
> > 
> > 
> > 
> > 
> > 
> > Best regards,
> > Yuepeng Pan
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > At 2025-08-10 23:54:37, "Matthias Pohl" <[email protected]> wrote:
> > >Hi Yuepeng,
> > >thanks for reminding me of this FLIP. I went over it and have a few items
> > >which we might need to address before we can actually finalize the vote:
> > >
> > >1. You mention a few options for when it comes to storing the data which is
> > >good. The FLIP doesn't point out, though, what option you're going to go
> > >for as part of this FLIP (as far as I can see). It would be good to only
> > >outline the option to go for in the FLIP and list the other options as
> > >rejected alternatives (with the pro's and con's). I think it make sense to
> > >go for option 3 (i.e. following what's done for the ExecutionGraphInfoStore
> > >for now). The other options can be considered as a follow-up.
> > >2. About the terminal states of a rescaling (i.e. IGNORED, FAILED,
> > >COMPLETED): Can we we clarify in the FLIP under what conditions the
> > >rescaling transitions into each of the three terminal states?
> > >3. The section "The information to record in a rescale event" could be
> > >restructured in four sections (to remove redundancy):
> > > a) The IDs (Rescale
> > >ID, resourceRequirementsEpochID, 
> > >subRescaleIdOfResourceRequirementsEpochID):
> > >What about making these names easier to read: GlobalRescaleID, RescaleUUID,
> > >RescaleAttemptId)
> > > b) Per-vertex data which includes: JobVertexID, JobVertexName,
> > >SlotSharingGroupId, the different parallelisms (pre-rescale, sufficient,
> > >desired, post-rescale)
> > > c) The SlotSharingGroup information: SlotSharingGroupId, name,
> > >ResourceProfile
> > > d) Other information: Timestamps of state transitions, etc. as laid out in
> > >the FLIP already
> > >4. The FLIP doesn't explain how the data is passed through the
> > >AdaptiveScheduler states. We should be handling some kind of
> > >RescaleSnapshot that is passed through the different states and updated and
> > >its final state is stored somewhere within AdaptiveScheduler in the end, I
> > >guess. Can we clarify that in the FLIP?
> > >5. You mention the config parameters for the cache in the public interface
> > >section. But there's no mentioning of any caching and how that is used
> > >within the FLIP.
> > >6. The REST endpoint is probably better suited in FLIP-487. FLIP-495 should
> > >be about the actual implementation details and how the data is stored
> > >internally whereas FLIP-487 is about exposing the information to the
> > >outside through the REST API and the Flink UI. That would be a way to
> > >decrease the scope of FLIP-495. WDYT?
> > >
> > >Best,
> > >Matthias
> > >
> > >
> > >On Mon, Mar 24, 2025 at 11:37 AM Yuepeng Pan <[email protected]> wrote:
> > >
> > >> Hi, Community,
> > >>
> > >> There haven’t been any further responses to this email over the past few
> > >> days.
> > >> I'd like to initiate a vote on the current proposal[1] in the next few
> > >> days.
> > >> Please rest assured that I’m proceeding cautiously and not rushing the
> > >> process.
> > >> If there are any concerns about this FLIP-495[1],
> > >> I will gladly pause and make the adjustments.
> > >>
> > >> Best regards,
> > >> Yuepeng Pan
> > >>
> > >> [1]
> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-495%3A+Support+AdaptiveScheduler+record+and+query+the+rescale+history
> > >>
> > >>
> > >> On 2024/12/17 15:18:45 Yuepeng Pan wrote:
> > >> > Hi community,
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > We discussed several aspects of FLIP-487[1] 'Show history of rescales 
> > >> > in
> > >> Web UI for AdaptiveScheduler'
> > >> > and received a lot of valuable feedback. Based on the suggestions from
> > >> the email thread[2],
> > >> > we plan to split the original proposal for FLIP-487[1].
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > The current email thread and the FLIP-495[3] wiki will be used to
> > >> discuss 'Support AdaptiveScheduler in recording and querying the rescale
> > >> history',
> > >> > while FLIP-487[1] will primarily focus on displaying-related design
> > >> content
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Looking forward to any feedback and opinions on FLIP-495[3].
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > [1]
> > >> https://cwiki.apache.org/confluence/display/FLINK/%5BWIP%5D+FLIP-487%3A+Show+history+of+rescales+in+Web+UI+for+AdaptiveScheduler
> > >> >
> > >> > [2] https://lists.apache.org/thread/f4md4btkf006mxcxf66bng1kfz0rsn8c
> > >> >
> > >> > [3]
> > >> https://cwiki.apache.org/confluence/display/FLINK/%5BWIP%5D+FLIP-495%3A+Support+AdaptiveScheduler+record+and+query+the+rescale+history
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Thank you very much.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Best,
> > >> >
> > >> > Regards.
> > >> >
> > >> > Yuepeng Pan
> > >>
> > 
> 

Reply via email to