Wow~
Thanks Matthias for resurfacing the voting thread[1].
I almost forgot about the thread I had initiated earlier.
In that case, let’s proceed with the voting process based on it—there's
nothing better.

Thank you for the timely reminder and your continued support as always!

[1] https://lists.apache.org/thread/1j5dkz4rzzp6htbo6s1w9c2qsvfjw8to

Best regards,
Yuepeng Pan



Matthias Pohl <[email protected]> 于2026年1月7日周三 23:29写道:

> There's no need to open another voting thread. I pushed the existing one
> [1] for FLIP-487 [2].
> Thanks again for driving this, Yuepeng.
>
> Best,
> Matthias
>
> [1] https://lists.apache.org/thread/1j5dkz4rzzp6htbo6s1w9c2qsvfjw8to
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-487%3A+Show+history+of+rescales+in+Web+UI+for+AdaptiveScheduler
>
> On Tue, Jan 6, 2026 at 4:27 AM Yuepeng Pan <[email protected]> wrote:
>
> > Hi, community.
> >
> > This discussion has been ongoing for some time, and I sincerely
> appreciate
> > the attention and support from the developers.
> > If there is no further feedback this week, I will initiate a vote next
> > week.
> >
> >
> > Best regards,
> >
> > Yuepeng Pan
> >
> > Yuepeng Pan <[email protected]> 于2026年1月5日周一 16:30写道:
> >
> > > Thank you, Matthias.
> > >
> > > > - I guess, you don't have to add the entire old section with the
> > > screenshots to the Rejected alternatives. The summary paragraph is good
> > > enough
> > >
> > > Yes, I deleted the redundant screenshots and information and kept the
> > core
> > > summary in paragraphs.
> > >
> > > > - There's a duplicated sentence under "The Web UI and REST
> interfaces"
> > > > > The design of the rescale history UI will follow the style of the
> > > checkpoints-related pages.
> > > > > But the design of the rescale history REST API will follow the
> style
> > > of the checkpoints-related interfaces.
> > >
> > > Thanks for your detailed review.
> > > You are right, there're typos.
> > > Updated and please let me have a try on clarifying it:
> > > The original meaning what I want to express is
> > > 'But the design of the rescale history REST API will not follow fully
> the
> > > style of the checkpoints-related interfaces.',
> > > because we refactored the old interface located in the rejected edition
> > > now into three new minor interfaces.
> > >
> > >
> > > Best,
> > > Yuepeng Pan
> > >
> > >
> > > Matthias Pohl <[email protected]> 于2026年1月5日周一 15:17写道:
> > >
> > >> Thank you. Nothing to add from my side aside from the following
> cosmetic
> > >> items:
> > >> - I guess, you don't have to add the entire old section with the
> > >> screenshots to the Rejected alternatives. The summary paragraph is
> good
> > >> enough
> > >> - There's a duplicated sentence under "The Web UI and REST interfaces"
> > >> > The design of the rescale history UI will follow the style of the
> > >> checkpoints-related pages.
> > >> > But the design of the rescale history REST API will follow the style
> > of
> > >> the checkpoints-related interfaces.
> > >>
> > >> Matthias
> > >>
> > >> On Fri, Jan 2, 2026 at 6:19 PM Yuepeng Pan <[email protected]>
> > >> wrote:
> > >>
> > >> > Hi, Matthias.
> > >> > No worries~ and thank you very much for your comments.
> > >> >
> > >> > I made some adjustments based on your suggestions.
> > >> >
> > >> > > - The link to the sketch (section "The Web UI and REST
> interfaces")
> > >> could
> > >> > > be removed. We should add any missing screenshots to the FLIP and
> > not
> > >> > rely
> > >> > > on external resources.
> > >> >
> > >> > Deleted and all of the UI pages are pasted into the wiki page.
> > >> > In the original versions, all relevant pages have already been
> posted
> > to
> > >> > the wiki.
> > >> > I have only removed the source file URLs.
> > >> >
> > >> > > - Maybe, add to the "Rescale Overview UI" section that the goal is
> > to
> > >> > have
> > >> > > the rescale overview aligned with the checkpoint overview
> > >> > > - For the /jobs/:jobid/rescales endpoint, splitting it up into
> three
> > >> > > endpoints /jobs/:jobid/rescales/{summary,history,overview} might
> be
> > a
> > >> > good
> > >> > > idea. For /config, we do it like that. But I also see the point of
> > >> > keeping
> > >> > > it as you proposed because we said we want to be close to what the
> > >> > > checkpoint REST endpoint and UI provides. Your call - you can list
> > the
> > >> > > option that you didn't go for under "Rejected Alternatives" to
> give
> > >> more
> > >> > > context around the goal that we wanted to keep the Rescale UI/REST
> > API
> > >> > > close to what is available for checkpoints.
> > >> >
> > >> > The idea you mentioned makes sense to me.
> > >> > And I updated and adapted the corresponding part based on your
> > opinion.
> > >> > PTAL~
> > >> >
> > >> > > - Under "Rescale Details UI" you added a sentence (below the
> > >> screenshot)
> > >> > > that feels like it should be fixed: "the items need todo keep same
> > as
> > >> > > mentioned Rescale Overview UI"
> > >> >
> > >> > Deleted.
> > >> >
> > >> > > - You can add a self-explanatory description for "Compatibility,
> > >> > > Deprecation, and Migration Plan" (e.g. No previous work needs to
> be
> > >> > > considered)
> > >> > > - Test Plan: REST endpoints will be tested with the RestHandler
> > >> > framework.
> > >> > > The UI will be tested visually through manual testing, I guess.
> > >> >
> > >> > Done.
> > >> >
> > >> >
> > >> > I'd appreciate any input.
> > >> >
> > >> > Best regards,
> > >> > Yuepeng Pan
> > >> >
> > >> >
> > >> > Matthias Pohl via dev <[email protected]> 于2026年1月3日周六 00:15写道:
> > >> >
> > >> >> Looks like I mixed things up when replying to your message and it
> > >> ended up
> > >> >> in the wrong thread. Apologies for the confusion. See my message
> > below:
> > >> >>
> > >> >> Happy New Year to you, too. I have nothing major to add here. Just
> a
> > >> few
> > >> >> minor things:
> > >> >>
> > >> >> - The link to the sketch (section "The Web UI and REST interfaces")
> > >> could
> > >> >> be removed. We should add any missing screenshots to the FLIP and
> not
> > >> rely
> > >> >> on external resources.
> > >> >> - Maybe, add to the "Rescale Overview UI" section that the goal is
> to
> > >> have
> > >> >> the rescale overview aligned with the checkpoint overview
> > >> >> - For the /jobs/:jobid/rescales endpoint, splitting it up into
> three
> > >> >> endpoints /jobs/:jobid/rescales/{summary,history,overview} might
> be a
> > >> good
> > >> >> idea. For /config, we do it like that. But I also see the point of
> > >> keeping
> > >> >> it as you proposed because we said we want to be close to what the
> > >> >> checkpoint REST endpoint and UI provides. Your call - you can list
> > the
> > >> >> option that you didn't go for under "Rejected Alternatives" to give
> > >> more
> > >> >> context around the goal that we wanted to keep the Rescale UI/REST
> > API
> > >> >> close to what is available for checkpoints.
> > >> >> - Under "Rescale Details UI" you added a sentence (below the
> > >> screenshot)
> > >> >> that feels like it should be fixed: "he items need todo keep same
> as
> > >> >> mentioned Rescale Overview UI"
> > >> >> - You can add a self-explanatory description for "Compatibility,
> > >> >> Deprecation, and Migration Plan" (e.g. No previous work needs to be
> > >> >> considered)
> > >> >> - Test Plan: REST endpoints will be tested with the RestHandler
> > >> framework.
> > >> >> The UI will be tested visually through manual testing, I guess.
> > >> >>
> > >> >> Best,
> > >> >> Matthias
> > >> >>
> > >> >> On Wed, Dec 31, 2025 at 5:37 PM Yuepeng Pan <
> [email protected]>
> > >> >> wrote:
> > >> >>
> > >> >> > Hi, Matthias.
> > >> >> > Thank you for your review and Happy New Year!
> > >> >> >
> > >> >> >
> > >> >> > a. About JSON schema:
> > >> >> >
> > >> >> > > You are right. Existing fields shouldn't be modified. Only for
> > new
> > >> >> ones,
> > >> >> > we
> > >> >> > > can make sure to not introduce more inconsistencies.
> > >> >> >
> > >> >> > > In general, the problem is that the JSON formatting is not
> > >> specified
> > >> >> in
> > >> >> > the
> > >> >> > > coding guidelines. That's why it comes with no surprise that
> > these
> > >> >> > > formatting inconsistencies exist. We would need to start a
> > >> discussion
> > >> >> on
> > >> >> > > updating the Flink coding guidelines first. Only afterwards, we
> > >> could
> > >> >> fix
> > >> >> > > the formatting.
> > >> >> >
> > >> >> > > Such a change would need to be rolled out as part of a major
> > >> version
> > >> >> > (e.g.
> > >> >> > > 3.0) only, though.
> > >> >> >
> > >> >> > Thanks for your confirmation & ideas.
> > >> >> > That sounds good to me!
> > >> >> >
> > >> >> > I’ve created a new Jira ticket[1] so that community contributors
> > can
> > >> >> track
> > >> >> > this new, independent piece of work.
> > >> >> >
> > >> >> >
> > >> >> > b. About the durationInMillis attribute
> > >> >> >
> > >> >> > Thanks for your response.
> > >> >> > I removed the durationInMillis from the corresponding json schema
> > of
> > >> >> REST
> > >> >> > API interfaces and added some required description on the reason
> > >> about
> > >> >> the
> > >> >> > deprecated 'durationInMillis'.
> > >> >> >
> > >> >> >
> > >> >> > Any input is appreciated!
> > >> >> >
> > >> >> >
> > >> >> > [1] https://issues.apache.org/jira/browse/FLINK-38853
> > >> >> >
> > >> >> >
> > >> >> > Best regards,
> > >> >> > Yuepeng Pan
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > Matthias Pohl <[email protected]> 于2025年12月31日周三 22:34写道:
> > >> >> >
> > >> >> > > Thanks for the quick response. I added my responses inline.
> PTAL
> > >> >> > >
> > >> >> > > Best,
> > >> >> > > Matthias
> > >> >> > >
> > >> >> > > On Mon, 22 Dec 2025, 01:02 Yuepeng Pan, <
> [email protected]>
> > >> >> wrote:
> > >> >> > >
> > >> >> > > > Hi, Matthias, I'm glad to see that email.
> > >> >> > > > And thank you very much for your review and comments.
> > >> >> > > >
> > >> >> > > > To facilitate reading and discussion,
> > >> >> > > > I have grouped related questions together as much as possible
> > >> >> > > > when organizing my responses to your comments,
> > >> >> > > > and I hope this will not cause any inconvenience.
> > >> >> > > >
> > >> >> > > >
> > >> >> > > > 1. Reference typo & format.
> > >> >> > > >
> > >> >> > > >
> > >> >> > > > > Adaptive Scheduler will support record and query the
> rescale
> > >> >> history
> > >> >> > > > in[2]
> > >> >> > > > > Shouldn't it have refer to reference #3, i.e. FLIP-495?
> > >> >> > > > > nit: In the wiki, we do not need to add the references but
> > use
> > >> >> links
> > >> >> > > with
> > >> >> > > > > proper link text (e.g. in the motivation paragraph). That
> > >> should
> > >> >> > > improve
> > >> >> > > > > readability.
> > >> >> > > >
> > >> >> > > > Thanks for the catching and suggestions. That makes sense to
> > me.
> > >> >> > > > I corrected and reformatted the citation errors
> > >> >> > > > and reference formats you mentioned throughout the entire
> > >> document.
> > >> >> > > >
> > >> >> > > >
> > >> >> > > > 2. Schemas:
> > >> >> > > >
> > >> >> > > > a. schema of the response for /jobs/overview
> > >> >> > > >
> > >> >> > > > > extended schema of the response for /jobs/overview
> > >> >> > > >
> > >> >> > > > > The extract of the schema extension is not precise: We
> should
> > >> >> show,
> > >> >> > > that
> > >> >> > > > > the new fields are added to the item type
> > >> >> > > > >
> > >> >> > >
> > >> >>
> > >>
> > (urn:jsonschema:org:apache:flink:runtime:messages:webmonitor:JobDetails).
> > >> >> > > > > About the field name formatting of "job-type": We still do
> > not
> > >> >> have
> > >> >> > > this
> > >> >> > > > > one included in the code convention. But AFAIS, we usually
> > >> follow
> > >> >> > > > camelCase
> > >> >> > > > > format rather kebab-casing. But especially the Job overview
> > >> uses
> > >> >> both
> > >> >> > > > > already.
> > >> >> > > >
> > >> >> > > > Thanks for the comments.
> > >> >> > > > That sounds good to me.
> > >> >> > > > I have updated the corresponding accompanying changes to the
> > >> >> JobDetails
> > >> >> > > > class.
> > >> >> > > >
> > >> >> > > > b. schema of response for /jobs/:jobid/rescales
> > >> >> > > >
> > >> >> > > > > Schema of response for /jobs/:jobid/rescales
> > >> >> > > > > I noticed that also for the other JSON schemas, we jump
> > between
> > >> >> > formats
> > >> >> > > > > (even introducing snake_casing). Let's unify them and stick
> > to
> > >> >> > > camelCase.
> > >> >> > > > > WDYT?
> > >> >> > > >
> > >> >> > > > Nice idea!
> > >> >> > > > Considering compatibility and the workload associated with
> this
> > >> >> FLIP,
> > >> >> > > > the existing fields are not modified in the current FLIP,
> > >> >> > > > only the newly introduced fields are named
> > >> >> > > > following the camelCase naming convention.
> > >> >> > > > And I updated the lines about schemas that need to change.
> > >> >> > >
> > >> >> > >
> > >> >> > > > Regarding the naming style changes for all fields in schemas
> > that
> > >> >> are
> > >> >> > > > modified (as opposed to newly introduced) within this FLIP,
> do
> > we
> > >> >> need
> > >> >> > a
> > >> >> > > > new FLIP to address and unify such work?
> > >> >> > > > This way, the new FLIP would focus solely on this type of
> task.
> > >> >> > > > What do you think about it ?
> > >> >> > > >
> > >> >> > >
> > >> >> > > You are right. Existing fields shouldn't be modified. Only for
> > new
> > >> >> ones,
> > >> >> > we
> > >> >> > > can make sure to not introduce more inconsistencies.
> > >> >> > >
> > >> >> > > In general, the problem is that the JSON formatting is not
> > >> specified
> > >> >> in
> > >> >> > the
> > >> >> > > coding guidelines. That's why it comes with no surprise that
> > these
> > >> >> > > formatting inconsistencies exist. We would need to start a
> > >> discussion
> > >> >> on
> > >> >> > > updating the Flink coding guidelines first. Only afterwards, we
> > >> could
> > >> >> fix
> > >> >> > > the formatting.
> > >> >> > >
> > >> >> > > Such a change would need to be rolled out as part of a major
> > >> version
> > >> >> > (e.g.
> > >> >> > > 3.0) only, though.
> > >> >> > >
> > >> >> > >
> > >> >> > > > c. For "summary.rescaleCounts"
> > >> >> > > >
> > >> >> > > > > For "summary.rescaleCounts", we might not need to add the
> > >> >> "_rescales"
> > >> >> > > > > suffix to the record fields since the parent indicates
> > already
> > >> >> that
> > >> >> > all
> > >> >> > > > of
> > >> >> > > > > the fields are rescale counts. We, therefore, could use
> > >> >> "inProgress",
> > >> >> > > > > "ignored", "completed", "failed".
> > >> >> > > >
> > >> >> > > > Yes, this indeed makes the expression more concise and to the
> > >> point.
> > >> >> > > > I updated this part.
> > >> >> > > >
> > >> >> > > > > Do we see value in adding the total
> > >> >> > > > > value? That could be easily calculated using the other four
> > >> >> metrics.
> > >> >> > > > Hence,
> > >> >> > > > > I think we can consider it as being redundant and remove
> it.
> > >> >> > > >
> > >> >> > > > This is acceptable, as the one of differences lies in
> > >> >> > > > whether the total value is calculated on the FE side or on
> the
> > >> >> backend.
> > >> >> > > >
> > >> >> > > > d. rescalesDurationStats/rescales_duration_stats(the previous
> > >> >> edition)
> > >> >> > > >
> > >> >> > > > > "rescales_duration_stats"
> > >> >> > > > > For all the "durationStats"? Can we add the time unit to
> make
> > >> >> things
> > >> >> > > > > clearer, e.g. "rescalesDurationStats" becomes
> > >> >> > > > > "rescalesDurationStatsInMillis"? ...same applies to the
> > >> timestamps
> > >> >> > > >
> > >> >> > > > Good idea~.
> > >> >> > > > I update the description of all attributes about timestamps.
> > >> >> > > > Please help take a look!
> > >> >> > > >
> > >> >> > > > e.
> > >> ignoredRescalesDurationStats/ignored_rescales_duration_stats(the
> > >> >> > > > previous edition)
> > >> >> > > >
> > >> >> > > > > "ignored_rescales_duration_stats"
> > >> >> > > > > Are the stats useful for rescales which were actually not
> > >> >> executed?
> > >> >> > > >
> > >> >> > > > Answering this question may be a bit difficult for me.
> > >> >> > > > In theory, since rescale operations of the Ignored type can
> > >> occur,
> > >> >> > > > it is reasonable to include them in the statistics—at least
> > >> >> > > > from the perspective of having a complete set of dimensions.
> > >> >> > > > In addition, I'm not certain whether users truly do not care
> > >> >> > > > about statistics for this type of data.
> > >> >> > > > Therefore, I kept it in the initial design document.
> > >> >> > > > If you think it is unnecessary to retain this data,
> > >> >> > > > we can exclude Ignored rescale types from the duration
> > >> statistics.
> > >> >> > > > I would appreciate your experience and opinion on this.
> > >> >> > >
> > >> >> > >
> > >> >> > > Fair enough.
> > >> >> > >
> > >> >> > > f. the durationInMillis attribute.
> > >> >> > >
> > >> >> > >
> > >> >> > > > > duration
> > >> >> > > > > Rescale details already contain the start and end time.
> > Adding
> > >> the
> > >> >> > > > duration
> > >> >> > > > > here shouldn't be necessary.
> > >> >> > > >
> > >> >> > > > If the frontend page does not involve overly complex display
> > >> logic,
> > >> >> > > > adding an additional durationInMillis field here should be
> > >> >> unnecessary.
> > >> >> > > >
> > >> >> > >
> > >> >> > > Just to clarify: I don't suggest removing the duration
> > information
> > >> >> from
> > >> >> > the
> > >> >> > > web UI. It's only obsolete in the REST API because it can be
> > >> >> calculated
> > >> >> > on
> > >> >> > > the client side.
> > >> >> > >
> > >> >> > >
> > >> >> > > >
> > >> >> > > > 3. UI
> > >> >> > > >
> > >> >> > > > a. Rescale History UI(related to 'durationInMillis'
> attribute)
> > >> >> > > >
> > >> >> > > > > Rescale History UI
> > >> >> > > > > The history looks nice. What making the duration of the
> > >> inProgress
> > >> >> > > > rescales
> > >> >> > > > > dynamic, i.e. counting the seconds up from the start time?
> > >> Keeping
> > >> >> > the
> > >> >> > > NA
> > >> >> > > > > is also fine if the dynamic approach is too complicated.
> > >> >> > > >
> > >> >> > > > In my limited reading,
> > >> >> > > > this is feasible from an implementation perspective,
> > >> >> > > > though it may require some adjustments.
> > >> >> > > > If we remove the durationInMillis field from rescale,
> > >> >> > > > the frontend would need to perform some additional processing
> > >> when
> > >> >> > > > displaying the data.
> > >> >> > > > For example:
> > >> >> > > > rescale{terminalState=inProgress, startTimestampInMillis=1,
> > >> >> > > > endTimestampInMillis=null, durationInMillis=3}
> > >> >> > > > If we keep the durationInMillis field, the frontend would
> > almost
> > >> not
> > >> >> > need
> > >> >> > > > any logic and could simply display the data as is.
> > >> >> > > > If we do not keep the durationInMillis field, the frontend
> > would
> > >> >> need
> > >> >> > to
> > >> >> > > do
> > >> >> > > > two things when rendering:
> > >> >> > > >   - Calculate durationInMillis based on
> startTimestampInMillis
> > >> and
> > >> >> > > > endTimestampInMillis
> > >> >> > > >   - When displaying records with terminalState = inProgress,
> > show
> > >> >> > > > endTimestampInMillis as null
> > >> >> > > >
> > >> >> > > > Similarly, for handling durationInMillis in schedulerState,
> > >> >> > > > I‘m not sure whether such scenarios would arise,
> > >> >> > > > although we have not yet considered
> > >> >> > > > whether this data should be displayed in the same way as
> > >> >> > > > Rescale.durationInMillis.
> > >> >> > > > Although the difference is small,
> > >> >> > > > it is worth clarifying so that we can better evaluate the
> > >> decision.
> > >> >> > > >
> > >> >> > > > Therefore, please let me know your thoughts on
> > >> >> > > > - whether we should keep the durationInMillis field for both
> > >> Rescale
> > >> >> > and
> > >> >> > > > schedulerState in the schema
> > >> >> > > > - Show N.A in the duration of InProgress Rescale and remove
> the
> > >> >> > > > durationInMillis in the related sub-json.
> > >> >> > > > - Or something reasonable from you.
> > >> >> > > >
> > >> >> > >
> > >> >> > > As mentioned in 2.f), I would remove the duration and calculate
> > it
> > >> >> > > dynamically in the client code. It shouldn't be a too complex
> > >> >> operation
> > >> >> > and
> > >> >> > > allows us to keep the duration dynamic for rescales in
> progress.
> > >> >> > >
> > >> >> > >
> > >> >> > > > b. Rescale Overview UI.
> > >> >> > > >
> > >> >> > > > > Rescale Overview UI
> > >> >> > > > > The screenshot shows "Acquired profile" twice for the slot
> > >> (based
> > >> >> on
> > >> >> > > the
> > >> >> > > > > details UI, the first one is supposed to be "required").
> > >> >> > > >
> > >> >> > > > Sorry for the typo. I corrected it.
> > >> >> > > >
> > >> >> > > > > Additionally, in
> > >> >> > > > > FLIP-495 we agreed on four metrics: previous, sufficient,
> > >> desired
> > >> >> and
> > >> >> > > > > acquired resources (for parallelism and profile). Should we
> > use
> > >> >> those
> > >> >> > > in
> > >> >> > > > > the UI as well?
> > >> >> > > >
> > >> >> > > > Okay. Updated it in the related UI draft pages.
> > >> >> > > >
> > >> >> > > > > We might want to add tooltips to the headers as well to
> > >> >> > > > > add a description for each of the metrics.
> > >> >> > > >
> > >> >> > > > > Could we add tooltips to the headers of the rescale
> overview
> > to
> > >> >> > > describe
> > >> >> > > > the different IDs?
> > >> >> > > >
> > >> >> > > > Yes, the suggestion is reasonable.
> > >> >> > > > And I added the description of hint messages about some core
> > >> header
> > >> >> > > > attributes after the corresponding UI draft pages.
> > >> >> > > > Looking forward to your opinion.
> > >> >> > > >
> > >> >> > > > 4. The new added items by me:
> > >> >> > > > I have added notes after some sections of the core UI pages
> > >> >> regarding
> > >> >> > > > limiting the displayed length of UUID-type identifiers and
> > issues
> > >> >> > related
> > >> >> > > > to task names.
> > >> >> > > >
> > >> >> > > > I'd greatly appreciate any suggestions you may have.
> > >> >> > > >
> > >> >> > > >
> > >> >> > > > Best regards,
> > >> >> > > > Yuepeng Pan
> > >> >> > > >
> > >> >> > > >
> > >> >> > > > Matthias Pohl <[email protected]> 于2025年12月18日周四 18:08写道:
> > >> >> > > >
> > >> >> > > > > Hi Yuepeng,
> > >> >> > > > > I finally found some time to look into that FLIP again.
> Sorry
> > >> for
> > >> >> the
> > >> >> > > > > delay. Thanks for working on this topic and pushing it.
> Here
> > >> are a
> > >> >> > few
> > >> >> > > > more
> > >> >> > > > > comments on the current state of FLIP-487:
> > >> >> > > > >
> > >> >> > > > > Adaptive Scheduler will support record and query the
> rescale
> > >> >> history
> > >> >> > > > in[2].
> > >> >> > > > >
> > >> >> > > > > Shouldn't it have refer to reference #3, i.e. FLIP-495?
> > >> >> > > > >
> > >> >> > > > > nit: In the wiki, we do not need to add the references but
> > use
> > >> >> links
> > >> >> > > with
> > >> >> > > > > proper link text (e.g. in the motivation paragraph). That
> > >> should
> > >> >> > > improve
> > >> >> > > > > readability.
> > >> >> > > > >
> > >> >> > > > > extended schema of the response for /jobs/overview
> > >> >> > > > >
> > >> >> > > > > The extract of the schema extension is not precise: We
> should
> > >> >> show,
> > >> >> > > that
> > >> >> > > > > the new fields are added to the item type
> > >> >> > > > >
> > >> >> > >
> > >> >>
> > >>
> > (urn:jsonschema:org:apache:flink:runtime:messages:webmonitor:JobDetails).
> > >> >> > > > > About the field name formatting of "job-type": We still do
> > not
> > >> >> have
> > >> >> > > this
> > >> >> > > > > one included in the code convention. But AFAIS, we usually
> > >> follow
> > >> >> > > > camelCase
> > >> >> > > > > format rather kebab-casing. But especially the Job overview
> > >> uses
> > >> >> both
> > >> >> > > > > already.
> > >> >> > > > >
> > >> >> > > > > Could we add tool tips to the headers of the rescale
> overview
> > >> to
> > >> >> > > describe
> > >> >> > > > > the different IDs?
> > >> >> > > > >
> > >> >> > > > > Schema of response for /jobs/:jobid/rescales
> > >> >> > > > >
> > >> >> > > > > I noticed that also for the other JSON schemas, we jump
> > between
> > >> >> > formats
> > >> >> > > > > (even introducing snake_casing). Let's unify them and stick
> > to
> > >> >> > > camelCase.
> > >> >> > > > > WDYT?
> > >> >> > > > >
> > >> >> > > > > For "summary.rescaleCounts", we might not need to add the
> > >> >> "_rescales"
> > >> >> > > > > suffix to the record fields since the parent indicate
> already
> > >> that
> > >> >> > all
> > >> >> > > of
> > >> >> > > > > the fields are rescale counts. We, therefore, could use
> > >> >> "inProgress",
> > >> >> > > > > "ignored", "completed", "failed". Do we see value in adding
> > the
> > >> >> total
> > >> >> > > > > value? That could be easily calculated using the other four
> > >> >> metrics.
> > >> >> > > > Hence,
> > >> >> > > > > I think we can consider it as being redundant and remove
> it.
> > >> >> > > > >
> > >> >> > > > > "rescales_duration_stats"
> > >> >> > > > >
> > >> >> > > > > For all the "durationStats"? Can we add the time unit to
> make
> > >> >> things
> > >> >> > > > > clearer, e.g. "rescalesDurationStats" becomes
> > >> >> > > > > "rescalesDurationStatsInMillis"? ...same applies to the
> > >> timestamps
> > >> >> > > > >
> > >> >> > > > > "ignored_rescales_duration_stats"
> > >> >> > > > >
> > >> >> > > > > Are the stats useful for rescales which were actually not
> > >> >> executed?
> > >> >> > > > >
> > >> >> > > > > duration
> > >> >> > > > >
> > >> >> > > > > Rescale details already contain the start and end time.
> > Adding
> > >> the
> > >> >> > > > duration
> > >> >> > > > > here shouldn't be necessary.
> > >> >> > > > >
> > >> >> > > > > Rescale Overview UI
> > >> >> > > > >
> > >> >> > > > >
> > >> >> > > > > The screenshot shows "Acquired profile" twice for the slot
> > >> (based
> > >> >> on
> > >> >> > > the
> > >> >> > > > > details UI, the first one is supposed to be "required").
> > >> >> > Additionally,
> > >> >> > > in
> > >> >> > > > > FLIP-495 we agreed on four metrics: previous, sufficient,
> > >> desired
> > >> >> and
> > >> >> > > > > acquired resources (for parallelism and profile). Should we
> > use
> > >> >> those
> > >> >> > > in
> > >> >> > > > > the UI as well? We might want to add tool tips to the
> headers
> > >> as
> > >> >> well
> > >> >> > > to
> > >> >> > > > > add a description for each of the metrics.
> > >> >> > > > >
> > >> >> > > > >  Rescale History UI
> > >> >> > > > >
> > >> >> > > > > The history looks nice. What making the duration of the
> > >> inProgress
> > >> >> > > > rescales
> > >> >> > > > > dynamic, i.e. counting the seconds up from the start time?
> > >> Keeping
> > >> >> > the
> > >> >> > > NA
> > >> >> > > > > is also fine if the dynamic approach is too complicated.
> > >> >> > > > >
> > >> >> > > > > Best,
> > >> >> > > > > Matthias
> > >> >> > > > >
> > >> >> > > > > On Wed, Nov 5, 2025 at 11:24 AM Yuepeng Pan <
> > >> >> [email protected]>
> > >> >> > > > wrote:
> > >> >> > > > >
> > >> >> > > > > > Bumping this thread. Thanks!
> > >> >> > > > > >
> > >> >> > > > > > Best regards,
> > >> >> > > > > > Yuepeng Pan
> > >> >> > > > > >
> > >> >> > > > > >
> > >> >> > > > > >
> > >> >> > > > > > On 2025/09/02 15:41:07 Yuepeng Pan wrote:
> > >> >> > > > > > > Hi, community.
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > At present, FLIP-495[1][2] has gone through a new round
> > of
> > >> >> > > > discussions
> > >> >> > > > > > and a preliminary general consensus has been reached,
> which
> > >> >> > provides
> > >> >> > > > the
> > >> >> > > > > > necessary premise for the discussion of the current
> > >> FLIP-487[3].
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > Therefore, I would like to resume the discussion on the
> > >> >> current
> > >> >> > > FLIP.
> > >> >> > > > > > >
> > >> >> > > > > > > The version of the current FLIP mainly covers and has
> > >> >> completed
> > >> >> > the
> > >> >> > > > > > following two aspects of design:
> > >> >> > > > > > > - The REST API design for querying rescale history
> > >> information
> > >> >> > > > > > > - The Web UI design for showing rescale history
> > information
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > Looking forward to your comments and suggestions.
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > [1]
> > >> >> > >
> https://lists.apache.org/thread/t3r9wdd5gpbqnvzw35kb3wb3d9brpnon
> > >> >> > > > > > > [2]
> > >> >> > > > > >
> > >> >> > > > >
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-495%3A+Support+AdaptiveScheduler+record+and+query+the+rescale+history
> > >> >> > > > > > > [3]
> > >> >> > > > > >
> > >> >> > > > >
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-487%3A+Show+history+of+rescales+in+Web+UI+for+AdaptiveScheduler
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > Best regards,
> > >> >> > > > > > > Yuepeng Pan
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > ---- Replied Message ----
> > >> >> > > > > > > | From | Matthias Pohl<[email protected]> |
> > >> >> > > > > > > | Date | 12/2/2024 16:59 |
> > >> >> > > > > > > | To | <[email protected]> |
> > >> >> > > > > > > | Subject | Re: [DISCUSS] FLIP-487: Show history of
> > >> rescales
> > >> >> in
> > >> >> > Web
> > >> >> > > > UI
> > >> >> > > > > > for AdaptiveScheduler |
> > >> >> > > > > > > Hi Yuepeng,
> > >> >> > > > > > > thanks for the proposal. Having a way to see the
> history
> > of
> > >> >> > > rescales
> > >> >> > > > > is a
> > >> >> > > > > > > nice feature, I guess. I went over the draft and have a
> > few
> > >> >> > > > questions:
> > >> >> > > > > > >
> > >> >> > > > > > > Can we reorganize the draft? Right now, we have some
> (for
> > >> >> > > > RescaleEvent,
> > >> >> > > > > > > Required/AcquiredParallelism) schema defined in the
> > >> "Proposed
> > >> >> > > > Changes"
> > >> >> > > > > > > section and some other schema under "Public
> Interfaces".
> > It
> > >> >> would
> > >> >> > > be
> > >> >> > > > > nice
> > >> >> > > > > > > to have this more organized.
> > >> >> > > > > > > Just as a suggestion: In the end the proposed changes
> > >> should
> > >> >> list
> > >> >> > > the
> > >> >> > > > > > > different REST endpoints you want to introduce
> (including
> > >> the
> > >> >> > > > > > corresponding
> > >> >> > > > > > > schemas for request and response).
> > >> >> > > > > > > ---
> > >> >> > > > > > > I'm also wondering whether it would make sense to focus
> > on
> > >> the
> > >> >> > REST
> > >> >> > > > > > > endpoints in this FLIP and put the UI work in a
> separate
> > >> FLIP.
> > >> >> > > WDYT?
> > >> >> > > > > > > Decreasing the scope would probably help handling the
> > >> required
> > >> >> > > > changes.
> > >> >> > > > > > > ---
> > >> >> > > > > > > Have you considered adding the onChange event timestamp
> > >> for a
> > >> >> > > rescale
> > >> >> > > > > > event
> > >> >> > > > > > > as well? We introduced a separation of the job
> > requirements
> > >> >> > change
> > >> >> > > > > event
> > >> >> > > > > > > and the actual rescale execution in FLIP-461 [1]. It
> > might
> > >> be
> > >> >> > worth
> > >> >> > > > > > > documenting the time when a change was monitored for
> the
> > >> first
> > >> >> > time
> > >> >> > > > > that
> > >> >> > > > > > > triggered the rescale. WDYT?
> > >> >> > > > > > > ---
> > >> >> > > > > > > You're mentioning "comments" as a field of the
> > >> RescaleEvent in
> > >> >> > your
> > >> >> > > > > > > proposal. What's the use-case here? Where are these
> > >> comments
> > >> >> > from?
> > >> >> > > > > > >
> > >> >> > > > > > > (update)
> > >> >> > > > > > > A brief talk with Yuepeng on that topic revealed that
> the
> > >> >> field
> > >> >> > is
> > >> >> > > > > > supposed
> > >> >> > > > > > > to be used for errors that occurred during the rescale
> > >> >> operation.
> > >> >> > > My
> > >> >> > > > > take
> > >> >> > > > > > > on that one:
> > >> >> > > > > > > - We might want to reconsider the field name in that
> case
> > >> >> (maybe
> > >> >> > > > > > > errors_during_rescale?). "comments" seems to be quite
> > >> generic.
> > >> >> > > > > > > - Additionally, shouldn't we make this a list of errors
> > >> rather
> > >> >> > > than a
> > >> >> > > > > > > String field?
> > >> >> > > > > > > - How certain are we that we can associate errors to
> the
> > >> >> actual
> > >> >> > > > rescale
> > >> >> > > > > > > operation and rather than the error being caused by
> > >> something
> > >> >> > else?
> > >> >> > > > > > > ---
> > >> >> > > > > > > In the schema of the RescaleEvent you describe the
> three
> > >> >> > different
> > >> >> > > > > > > ID/numbers in the following way:
> > >> >> > > > > > >
> > >> >> > > > > > > The ‘id’ is automatically incremental, The
> > >> rescaleAttemptId is
> > >> >> > > > > generated
> > >> >> > > > > > > based on one specified resource-requirement and the
> > attempt
> > >> >> > number
> > >> >> > > is
> > >> >> > > > > > > generated based on rescaleAttemptId.
> > >> >> > > > > > >
> > >> >> > > > > > > But there is no "attempt number" mentioned in the
> > >> RescaleEvent
> > >> >> > > > schema.
> > >> >> > > > > > > Additionally, what is the ID based on? Do we start
> from 0
> > >> and
> > >> >> > just
> > >> >> > > > > > > increment? Or do we want to have a mechanism that
> ensures
> > >> that
> > >> >> > the
> > >> >> > > > IDs
> > >> >> > > > > > are
> > >> >> > > > > > > also unique/monotonically increasing after JobManager
> > >> >> failovers?
> > >> >> > > > > > > ---
> > >> >> > > > > > > For the parallelism schema: I might be misreading the
> > draft
> > >> >> here
> > >> >> > > but
> > >> >> > > > > > you're
> > >> >> > > > > > > proposing to use the subtask name as the ID to refer to
> > the
> > >> >> > > > JobVertex?
> > >> >> > > > > > That
> > >> >> > > > > > > the name might become quite long. What about using the
> > >> >> > JobVertexID
> > >> >> > > > > here.
> > >> >> > > > > > > That would be also more aligned to how the parallelism
> is
> > >> >> > > represented
> > >> >> > > > > by
> > >> >> > > > > > > the /jobs/<job-id>/resource-requirements endpoint. If
> we
> > >> want
> > >> >> to
> > >> >> > > add
> > >> >> > > > > the
> > >> >> > > > > > > task name for readability purposes, we can still add
> this
> > >> one
> > >> >> as
> > >> >> > a
> > >> >> > > > > > taskName
> > >> >> > > > > > > field to the Required/AcquiredParallelism schema.
> > >> >> > > > > > > ---
> > >> >> > > > > > > Status field:
> > >> >> > > > > > > - What is the meaning of "TRYING"? I guess, we're more
> or
> > >> less
> > >> >> > > using
> > >> >> > > > > the
> > >> >> > > > > > > AdaptiveScheduler states here, aren't we? Can't we
> > >> >> align/stick to
> > >> >> > > the
> > >> >> > > > > > > naming that's defined in the AdaptiveScheduler state?
> > >> >> > > > > > > ---
> > >> >> > > > > > > Do we really need a new REST endpoint for the
> > >> configuration?
> > >> >> > Can't
> > >> >> > > we
> > >> >> > > > > get
> > >> >> > > > > > > the provided information already from the existing
> > >> >> configuration
> > >> >> > > > > > endpoint?
> > >> >> > > > > > > That said, I still find it useful to have a config tab
> in
> > >> the
> > >> >> UI
> > >> >> > at
> > >> >> > > > the
> > >> >> > > > > > end.
> > >> >> > > > > > > ---
> > >> >> > > > > > > For the summary endpoint: I see similarities to the
> > >> checkpoint
> > >> >> > > > summary
> > >> >> > > > > > > here. Not sure whether you already considered that but
> > >> would
> > >> >> it
> > >> >> > > make
> > >> >> > > > > > sense
> > >> >> > > > > > > to align the field names in some way to have a
> consistent
> > >> >> > > > > look-and-feel?
> > >> >> > > > > > > I'm also wondering whether it makes sense to align the
> > >> schema
> > >> >> to
> > >> >> > > have
> > >> >> > > > > > > something like latest rescale, failed rescale, ...
> > >> >> > > > > > >
> > >> >> > > > > > > Best,
> > >> >> > > > > > > Matthias
> > >> >> > > > > > >
> > >> >> > > > > > > [1]
> > >> >> > > > > > >
> > >> >> > > > > >
> > >> >> > > > >
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler
> > >> >> > > > > > >
> > >> >> > > > > > > On Mon, Nov 25, 2024 at 11:24 AM yuanfeng hu <
> > >> >> > [email protected]>
> > >> >> > > > > > wrote:
> > >> >> > > > > > >
> > >> >> > > > > > > +1, I think this feature is very useful for adaptive
> > >> >> scheduler.
> > >> >> > > > > > >
> > >> >> > > > > > > Yuepeng Pan <[email protected]> 于2024年11月22日周五
> > >> 18:38写道:
> > >> >> > > > > > >
> > >> >> > > > > > > Hi community,
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > Currently, the Adaptive Scheduler already supports the
> > REST
> > >> >> API
> > >> >> > > > > > >
> > >> >> > > > > > > to manually adjust[1] the parallelism of jobs, which
> > >> enhances
> > >> >> the
> > >> >> > > > > > >
> > >> >> > > > > > > functionality of the Adaptive Scheduler.
> > >> >> > > > > > >
> > >> >> > > > > > > However, Adaptive Scheduler doesn't support displaying
> or
> > >> >> tracing
> > >> >> > > the
> > >> >> > > > > > > rescale history yet[2].
> > >> >> > > > > > >
> > >> >> > > > > > > This makes it inconvenient for users/devs to quickly
> > obtain
> > >> >> some
> > >> >> > > > > internal
> > >> >> > > > > > >
> > >> >> > > > > > > information about the rescale history of the Adaptive
> > >> >> Scheduler.
> > >> >> > > > > > >
> > >> >> > > > > > > And showing the history of rescale events of
> > >> >> AdaptiveScheduler in
> > >> >> > > the
> > >> >> > > > > web
> > >> >> > > > > > >
> > >> >> > > > > > > UI is very useful for users to make the next step for
> > jobs.
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > Therefore, I created the FLIP-487[3] doc to support
> > >> >> > > > > > >
> > >> >> > > > > > > 'Show history of rescales in Web UI for
> > AdaptiveScheduler'.
> > >> >> > > > > > >
> > >> >> > > > > > > Please refer to the google document[3] for more details
> > >> >> > > > > > >
> > >> >> > > > > > > about the proposed design and implementation.
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > Looking forward to any feedback and opinions on this
> > >> proposal.
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > [1]
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > >
> > >> >> > > > >
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management
> > >> >> > > > > > >
> > >> >> > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-22258
> > >> >> > > > > > >
> > >> >> > > > > > > [3]
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > >
> > >> >> > > > >
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> https://docs.google.com/document/d/1WrLBkSkYe2tBQ3j66gKHFr2OB0d1HuHKDrRVr6B8nkM/edit?tab=t.0
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > Thank you very much.
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > Best,
> > >> >> > > > > > >
> > >> >> > > > > > > Regards.
> > >> >> > > > > > >
> > >> >> > > > > > > Yuepeng Pan
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > --
> > >> >> > > > > > > Best,
> > >> >> > > > > > > Yuanfeng
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > >
> > >> >> > > > >
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >> >
> > >>
> > >
> >
>

Reply via email to