Bumping this thread kindly. Thanks!

Best,
Yuepeng Pan 




At 2025-08-13 14:52:26, "Yuepeng Pan" <panyuep...@apache.org> wrote:

Hi, Matthias,
Thank you  very much for your comments!
I have carefully read your reply and made some changes in the hope of making 
improvements.
Please help take a look.

For your comments:

> 1. You mention a few options for when it comes to storing the data which is
> good. The FLIP doesn't point out, though, what option you're going to go
> for as part of this FLIP (as far as I can see). It would be good to only
> outline the option to go for in the FLIP and list the other options as
> rejected alternatives (with the pro's and con's). I think it make sense to
> go for option 3 (i.e. following what's done for the ExecutionGraphInfoStore
> for now). The other options can be considered as a follow-up.

This is very meaningful. Based on this comment, I have kept option 3 in its 
original place and moved the other candidate options to [1].

> 2. About the terminal states of a rescaling (i.e. IGNORED, FAILED,
> COMPLETED): Can we we clarify in the FLIP under what conditions the
> rescaling transitions into each of the three terminal states?

Yes, this is a reasonable request for understanding and explaining the logic of 
transitions to terminated states.
A new subsection [2] has been added to address this.

> 3. The section "The information to record in a rescale event" could be
> restructured in four sections (to remove redundancy):
> a) The IDs (Rescale
> ID, resourceRequirementsEpochID, subRescaleIdOfResourceRequirementsEpochID):
> What about making these names easier to read: GlobalRescaleID, RescaleUUID,
> RescaleAttemptId)
> b) Per-vertex data which includes: JobVertexID, JobVertexName,
> SlotSharingGroupId, the different parallelisms (pre-rescale, sufficient,
> desired, post-rescale)
> c) The SlotSharingGroup information: SlotSharingGroupId, name,
> ResourceProfile
> d) Other information: Timestamps of state transitions, etc. as laid out in
> the FLIP already

That makes sense to me. Please check [3] for the latest updates in this part.

> 4. The FLIP doesn't explain how the data is passed through the
> AdaptiveScheduler states. We should be handling some kind of
> RescaleSnapshot that is passed through the different states and updated and
> its final state is stored somewhere within AdaptiveScheduler in the end, I
> guess. Can we clarify that in the FLIP?

Indeed — this was missing in the original FLIP. To address this, I have added 
[4], which focuses on describing how a Rescale is represented,
and how we can quickly pass and maintain the Rescale history.

> 5. You mention the config parameters for the cache in the public interface
> section. But there's no mentioning of any caching and how that is used
> within the FLIP.

Sorry for the rough description in the previous version.
Since this part belongs to the REST API acceleration mechanism for rescaling, 
and Option 6 seems reasonable to me,
I plan to add it to FLIP-487 once the design of FLIP-495 has reached consensus.
Of course, if needed, I'd be happy to clarify the usage and purpose of this 
parameter in the current email thread.

> 6. The REST endpoint is probably better suited in FLIP-487. FLIP-495 should
> be about the actual implementation details and how the data is stored
> internally whereas FLIP-487 is about exposing the information to the
> outside through the REST API and the Flink UI. That would be a way to
> decrease the scope of FLIP-495. WDYT?

That sounds nice to me. Therefore, I have moved all REST API–related changes to 
FLIP-487. 
BTW, to avoid repetitive changes in FLIP-487, I'll start organizing FLIP-487 
after FLIP-495 has been finalized.

Looking forward to your next review!

[1]https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=334760525#FLIP495:SupportAdaptiveSchedulerrecordandquerytherescalehistory-Aboutrescaleeventsstorage.1
[2]https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=334760525#FLIP495:SupportAdaptiveSchedulerrecordandquerytherescalehistory-ThemainscenarioswhereRescalestatusswitchestoterminated
[3]https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=334760525#FLIP495:SupportAdaptiveSchedulerrecordandquerytherescalehistory-Theinformationtorecordinarescaleevent
[4]https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=334760525#FLIP495:SupportAdaptiveSchedulerrecordandquerytherescalehistory-InternalInterfaces





Best regards,
Yuepeng Pan











At 2025-08-10 23:54:37, "Matthias Pohl" <map...@apache.org> wrote:
>Hi Yuepeng,
>thanks for reminding me of this FLIP. I went over it and have a few items
>which we might need to address before we can actually finalize the vote:
>
>1. You mention a few options for when it comes to storing the data which is
>good. The FLIP doesn't point out, though, what option you're going to go
>for as part of this FLIP (as far as I can see). It would be good to only
>outline the option to go for in the FLIP and list the other options as
>rejected alternatives (with the pro's and con's). I think it make sense to
>go for option 3 (i.e. following what's done for the ExecutionGraphInfoStore
>for now). The other options can be considered as a follow-up.
>2. About the terminal states of a rescaling (i.e. IGNORED, FAILED,
>COMPLETED): Can we we clarify in the FLIP under what conditions the
>rescaling transitions into each of the three terminal states?
>3. The section "The information to record in a rescale event" could be
>restructured in four sections (to remove redundancy):
> a) The IDs (Rescale
>ID, resourceRequirementsEpochID, subRescaleIdOfResourceRequirementsEpochID):
>What about making these names easier to read: GlobalRescaleID, RescaleUUID,
>RescaleAttemptId)
> b) Per-vertex data which includes: JobVertexID, JobVertexName,
>SlotSharingGroupId, the different parallelisms (pre-rescale, sufficient,
>desired, post-rescale)
> c) The SlotSharingGroup information: SlotSharingGroupId, name,
>ResourceProfile
> d) Other information: Timestamps of state transitions, etc. as laid out in
>the FLIP already
>4. The FLIP doesn't explain how the data is passed through the
>AdaptiveScheduler states. We should be handling some kind of
>RescaleSnapshot that is passed through the different states and updated and
>its final state is stored somewhere within AdaptiveScheduler in the end, I
>guess. Can we clarify that in the FLIP?
>5. You mention the config parameters for the cache in the public interface
>section. But there's no mentioning of any caching and how that is used
>within the FLIP.
>6. The REST endpoint is probably better suited in FLIP-487. FLIP-495 should
>be about the actual implementation details and how the data is stored
>internally whereas FLIP-487 is about exposing the information to the
>outside through the REST API and the Flink UI. That would be a way to
>decrease the scope of FLIP-495. WDYT?
>
>Best,
>Matthias
>
>
>On Mon, Mar 24, 2025 at 11:37 AM Yuepeng Pan <panyuep...@apache.org> wrote:
>
>> Hi, Community,
>>
>> There haven’t been any further responses to this email over the past few
>> days.
>> I'd like to initiate a vote on the current proposal[1] in the next few
>> days.
>> Please rest assured that I’m proceeding cautiously and not rushing the
>> process.
>> If there are any concerns about this FLIP-495[1],
>> I will gladly pause and make the adjustments.
>>
>> Best regards,
>> Yuepeng Pan
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-495%3A+Support+AdaptiveScheduler+record+and+query+the+rescale+history
>>
>>
>> On 2024/12/17 15:18:45 Yuepeng Pan wrote:
>> > Hi community,
>> >
>> >
>> >
>> >
>> > We discussed several aspects of FLIP-487[1] 'Show history of rescales in
>> Web UI for AdaptiveScheduler'
>> > and received a lot of valuable feedback. Based on the suggestions from
>> the email thread[2],
>> > we plan to split the original proposal for FLIP-487[1].
>> >
>> >
>> >
>> >
>> > The current email thread and the FLIP-495[3] wiki will be used to
>> discuss 'Support AdaptiveScheduler in recording and querying the rescale
>> history',
>> > while FLIP-487[1] will primarily focus on displaying-related design
>> content
>> >
>> >
>> >
>> >
>> > Looking forward to any feedback and opinions on FLIP-495[3].
>> >
>> >
>> >
>> >
>> > [1]
>> https://cwiki.apache.org/confluence/display/FLINK/%5BWIP%5D+FLIP-487%3A+Show+history+of+rescales+in+Web+UI+for+AdaptiveScheduler
>> >
>> > [2] https://lists.apache.org/thread/f4md4btkf006mxcxf66bng1kfz0rsn8c
>> >
>> > [3]
>> https://cwiki.apache.org/confluence/display/FLINK/%5BWIP%5D+FLIP-495%3A+Support+AdaptiveScheduler+record+and+query+the+rescale+history
>> >
>> >
>> >
>> >
>> > Thank you very much.
>> >
>> >
>> >
>> >
>> > Best,
>> >
>> > Regards.
>> >
>> > Yuepeng Pan
>>

Reply via email to