Re: [DISCUSS] FLIP-487: Show history of rescales in Web UI for AdaptiveScheduler

Matthias Pohl Sun, 04 Jan 2026 23:17:21 -0800

Thank you. Nothing to add from my side aside from the following cosmetic
items:
- I guess, you don't have to add the entire old section with the
screenshots to the Rejected alternatives. The summary paragraph is good
enough
- There's a duplicated sentence under "The Web UI and REST interfaces"
> The design of the rescale history UI will follow the style of the
checkpoints-related pages.
> But the design of the rescale history REST API will follow the style of
the checkpoints-related interfaces.


Matthias

On Fri, Jan 2, 2026 at 6:19 PM Yuepeng Pan <[email protected]> wrote:

> Hi, Matthias.
> No worries~ and thank you very much for your comments.
>
> I made some adjustments based on your suggestions.
>
> > - The link to the sketch (section "The Web UI and REST interfaces") could
> > be removed. We should add any missing screenshots to the FLIP and not
> rely
> > on external resources.
>
> Deleted and all of the UI pages are pasted into the wiki page.
> In the original versions, all relevant pages have already been posted to
> the wiki.
> I have only removed the source file URLs.
>
> > - Maybe, add to the "Rescale Overview UI" section that the goal is to
> have
> > the rescale overview aligned with the checkpoint overview
> > - For the /jobs/:jobid/rescales endpoint, splitting it up into three
> > endpoints /jobs/:jobid/rescales/{summary,history,overview} might be a
> good
> > idea. For /config, we do it like that. But I also see the point of
> keeping
> > it as you proposed because we said we want to be close to what the
> > checkpoint REST endpoint and UI provides. Your call - you can list the
> > option that you didn't go for under "Rejected Alternatives" to give more
> > context around the goal that we wanted to keep the Rescale UI/REST API
> > close to what is available for checkpoints.
>
> The idea you mentioned makes sense to me.
> And I updated and adapted the corresponding part based on your opinion.
> PTAL~
>
> > - Under "Rescale Details UI" you added a sentence (below the screenshot)
> > that feels like it should be fixed: "the items need todo keep same as
> > mentioned Rescale Overview UI"
>
> Deleted.
>
> > - You can add a self-explanatory description for "Compatibility,
> > Deprecation, and Migration Plan" (e.g. No previous work needs to be
> > considered)
> > - Test Plan: REST endpoints will be tested with the RestHandler
> framework.
> > The UI will be tested visually through manual testing, I guess.
>
> Done.
>
>
> I'd appreciate any input.
>
> Best regards,
> Yuepeng Pan
>
>
> Matthias Pohl via dev <[email protected]> 于2026年1月3日周六 00:15写道：
>
>> Looks like I mixed things up when replying to your message and it ended up
>> in the wrong thread. Apologies for the confusion. See my message below:
>>
>> Happy New Year to you, too. I have nothing major to add here. Just a few
>> minor things:
>>
>> - The link to the sketch (section "The Web UI and REST interfaces") could
>> be removed. We should add any missing screenshots to the FLIP and not rely
>> on external resources.
>> - Maybe, add to the "Rescale Overview UI" section that the goal is to have
>> the rescale overview aligned with the checkpoint overview
>> - For the /jobs/:jobid/rescales endpoint, splitting it up into three
>> endpoints /jobs/:jobid/rescales/{summary,history,overview} might be a good
>> idea. For /config, we do it like that. But I also see the point of keeping
>> it as you proposed because we said we want to be close to what the
>> checkpoint REST endpoint and UI provides. Your call - you can list the
>> option that you didn't go for under "Rejected Alternatives" to give more
>> context around the goal that we wanted to keep the Rescale UI/REST API
>> close to what is available for checkpoints.
>> - Under "Rescale Details UI" you added a sentence (below the screenshot)
>> that feels like it should be fixed: "he items need todo keep same as
>> mentioned Rescale Overview UI"
>> - You can add a self-explanatory description for "Compatibility,
>> Deprecation, and Migration Plan" (e.g. No previous work needs to be
>> considered)
>> - Test Plan: REST endpoints will be tested with the RestHandler framework.
>> The UI will be tested visually through manual testing, I guess.
>>
>> Best,
>> Matthias
>>
>> On Wed, Dec 31, 2025 at 5:37 PM Yuepeng Pan <[email protected]>
>> wrote:
>>
>> > Hi, Matthias.
>> > Thank you for your review and Happy New Year!
>> >
>> >
>> > a. About JSON schema:
>> >
>> > > You are right. Existing fields shouldn't be modified. Only for new
>> ones,
>> > we
>> > > can make sure to not introduce more inconsistencies.
>> >
>> > > In general, the problem is that the JSON formatting is not specified
>> in
>> > the
>> > > coding guidelines. That's why it comes with no surprise that these
>> > > formatting inconsistencies exist. We would need to start a discussion
>> on
>> > > updating the Flink coding guidelines first. Only afterwards, we could
>> fix
>> > > the formatting.
>> >
>> > > Such a change would need to be rolled out as part of a major version
>> > (e.g.
>> > > 3.0) only, though.
>> >
>> > Thanks for your confirmation & ideas.
>> > That sounds good to me!
>> >
>> > I’ve created a new Jira ticket[1] so that community contributors can
>> track
>> > this new, independent piece of work.
>> >
>> >
>> > b. About the durationInMillis attribute
>> >
>> > Thanks for your response.
>> > I removed the durationInMillis from the corresponding json schema of
>> REST
>> > API interfaces and added some required description on the reason about
>> the
>> > deprecated 'durationInMillis'.
>> >
>> >
>> > Any input is appreciated!
>> >
>> >
>> > [1] https://issues.apache.org/jira/browse/FLINK-38853
>> >
>> >
>> > Best regards,
>> > Yuepeng Pan
>> >
>> >
>> >
>> > Matthias Pohl <[email protected]> 于2025年12月31日周三 22:34写道：
>> >
>> > > Thanks for the quick response. I added my responses inline. PTAL
>> > >
>> > > Best,
>> > > Matthias
>> > >
>> > > On Mon, 22 Dec 2025, 01:02 Yuepeng Pan, <[email protected]>
>> wrote:
>> > >
>> > > > Hi, Matthias, I'm glad to see that email.
>> > > > And thank you very much for your review and comments.
>> > > >
>> > > > To facilitate reading and discussion,
>> > > > I have grouped related questions together as much as possible
>> > > > when organizing my responses to your comments,
>> > > > and I hope this will not cause any inconvenience.
>> > > >
>> > > >
>> > > > 1. Reference typo & format.
>> > > >
>> > > >
>> > > > > Adaptive Scheduler will support record and query the rescale
>> history
>> > > > in[2]
>> > > > > Shouldn't it have refer to reference #3, i.e. FLIP-495?
>> > > > > nit: In the wiki, we do not need to add the references but use
>> links
>> > > with
>> > > > > proper link text (e.g. in the motivation paragraph). That should
>> > > improve
>> > > > > readability.
>> > > >
>> > > > Thanks for the catching and suggestions. That makes sense to me.
>> > > > I corrected and reformatted the citation errors
>> > > > and reference formats you mentioned throughout the entire document.
>> > > >
>> > > >
>> > > > 2. Schemas:
>> > > >
>> > > > a. schema of the response for /jobs/overview
>> > > >
>> > > > > extended schema of the response for /jobs/overview
>> > > >
>> > > > > The extract of the schema extension is not precise: We should
>> show,
>> > > that
>> > > > > the new fields are added to the item type
>> > > > >
>> > >
>> (urn:jsonschema:org:apache:flink:runtime:messages:webmonitor:JobDetails).
>> > > > > About the field name formatting of "job-type": We still do not
>> have
>> > > this
>> > > > > one included in the code convention. But AFAIS, we usually follow
>> > > > camelCase
>> > > > > format rather kebab-casing. But especially the Job overview uses
>> both
>> > > > > already.
>> > > >
>> > > > Thanks for the comments.
>> > > > That sounds good to me.
>> > > > I have updated the corresponding accompanying changes to the
>> JobDetails
>> > > > class.
>> > > >
>> > > > b. schema of response for /jobs/:jobid/rescales
>> > > >
>> > > > > Schema of response for /jobs/:jobid/rescales
>> > > > > I noticed that also for the other JSON schemas, we jump between
>> > formats
>> > > > > (even introducing snake_casing). Let's unify them and stick to
>> > > camelCase.
>> > > > > WDYT?
>> > > >
>> > > > Nice idea!
>> > > > Considering compatibility and the workload associated with this
>> FLIP,
>> > > > the existing fields are not modified in the current FLIP,
>> > > > only the newly introduced fields are named
>> > > > following the camelCase naming convention.
>> > > > And I updated the lines about schemas that need to change.
>> > >
>> > >
>> > > > Regarding the naming style changes for all fields in schemas that
>> are
>> > > > modified (as opposed to newly introduced) within this FLIP, do we
>> need
>> > a
>> > > > new FLIP to address and unify such work?
>> > > > This way, the new FLIP would focus solely on this type of task.
>> > > > What do you think about it ?
>> > > >
>> > >
>> > > You are right. Existing fields shouldn't be modified. Only for new
>> ones,
>> > we
>> > > can make sure to not introduce more inconsistencies.
>> > >
>> > > In general, the problem is that the JSON formatting is not specified
>> in
>> > the
>> > > coding guidelines. That's why it comes with no surprise that these
>> > > formatting inconsistencies exist. We would need to start a discussion
>> on
>> > > updating the Flink coding guidelines first. Only afterwards, we could
>> fix
>> > > the formatting.
>> > >
>> > > Such a change would need to be rolled out as part of a major version
>> > (e.g.
>> > > 3.0) only, though.
>> > >
>> > >
>> > > > c. For "summary.rescaleCounts"
>> > > >
>> > > > > For "summary.rescaleCounts", we might not need to add the
>> "_rescales"
>> > > > > suffix to the record fields since the parent indicates already
>> that
>> > all
>> > > > of
>> > > > > the fields are rescale counts. We, therefore, could use
>> "inProgress",
>> > > > > "ignored", "completed", "failed".
>> > > >
>> > > > Yes, this indeed makes the expression more concise and to the point.
>> > > > I updated this part.
>> > > >
>> > > > > Do we see value in adding the total
>> > > > > value? That could be easily calculated using the other four
>> metrics.
>> > > > Hence,
>> > > > > I think we can consider it as being redundant and remove it.
>> > > >
>> > > > This is acceptable, as the one of differences lies in
>> > > > whether the total value is calculated on the FE side or on the
>> backend.
>> > > >
>> > > > d. rescalesDurationStats/rescales_duration_stats(the previous
>> edition)
>> > > >
>> > > > > "rescales_duration_stats"
>> > > > > For all the "durationStats"? Can we add the time unit to make
>> things
>> > > > > clearer, e.g. "rescalesDurationStats" becomes
>> > > > > "rescalesDurationStatsInMillis"? ...same applies to the timestamps
>> > > >
>> > > > Good idea~.
>> > > > I update the description of all attributes about timestamps.
>> > > > Please help take a look!
>> > > >
>> > > > e. ignoredRescalesDurationStats/ignored_rescales_duration_stats(the
>> > > > previous edition)
>> > > >
>> > > > > "ignored_rescales_duration_stats"
>> > > > > Are the stats useful for rescales which were actually not
>> executed?
>> > > >
>> > > > Answering this question may be a bit difficult for me.
>> > > > In theory, since rescale operations of the Ignored type can occur,
>> > > > it is reasonable to include them in the statistics—at least
>> > > > from the perspective of having a complete set of dimensions.
>> > > > In addition, I'm not certain whether users truly do not care
>> > > > about statistics for this type of data.
>> > > > Therefore, I kept it in the initial design document.
>> > > > If you think it is unnecessary to retain this data,
>> > > > we can exclude Ignored rescale types from the duration statistics.
>> > > > I would appreciate your experience and opinion on this.
>> > >
>> > >
>> > > Fair enough.
>> > >
>> > > f. the durationInMillis attribute.
>> > >
>> > >
>> > > > > duration
>> > > > > Rescale details already contain the start and end time. Adding the
>> > > > duration
>> > > > > here shouldn't be necessary.
>> > > >
>> > > > If the frontend page does not involve overly complex display logic,
>> > > > adding an additional durationInMillis field here should be
>> unnecessary.
>> > > >
>> > >
>> > > Just to clarify: I don't suggest removing the duration information
>> from
>> > the
>> > > web UI. It's only obsolete in the REST API because it can be
>> calculated
>> > on
>> > > the client side.
>> > >
>> > >
>> > > >
>> > > > 3. UI
>> > > >
>> > > > a. Rescale History UI(related to 'durationInMillis' attribute)
>> > > >
>> > > > > Rescale History UI
>> > > > > The history looks nice. What making the duration of the inProgress
>> > > > rescales
>> > > > > dynamic, i.e. counting the seconds up from the start time? Keeping
>> > the
>> > > NA
>> > > > > is also fine if the dynamic approach is too complicated.
>> > > >
>> > > > In my limited reading,
>> > > > this is feasible from an implementation perspective,
>> > > > though it may require some adjustments.
>> > > > If we remove the durationInMillis field from rescale,
>> > > > the frontend would need to perform some additional processing when
>> > > > displaying the data.
>> > > > For example:
>> > > > rescale{terminalState=inProgress, startTimestampInMillis=1,
>> > > > endTimestampInMillis=null, durationInMillis=3}
>> > > > If we keep the durationInMillis field, the frontend would almost not
>> > need
>> > > > any logic and could simply display the data as is.
>> > > > If we do not keep the durationInMillis field, the frontend would
>> need
>> > to
>> > > do
>> > > > two things when rendering:
>> > > >   - Calculate durationInMillis based on startTimestampInMillis and
>> > > > endTimestampInMillis
>> > > >   - When displaying records with terminalState = inProgress, show
>> > > > endTimestampInMillis as null
>> > > >
>> > > > Similarly, for handling durationInMillis in schedulerState,
>> > > > I‘m not sure whether such scenarios would arise,
>> > > > although we have not yet considered
>> > > > whether this data should be displayed in the same way as
>> > > > Rescale.durationInMillis.
>> > > > Although the difference is small,
>> > > > it is worth clarifying so that we can better evaluate the decision.
>> > > >
>> > > > Therefore, please let me know your thoughts on
>> > > > - whether we should keep the durationInMillis field for both Rescale
>> > and
>> > > > schedulerState in the schema
>> > > > - Show N.A in the duration of InProgress Rescale and remove the
>> > > > durationInMillis in the related sub-json.
>> > > > - Or something reasonable from you.
>> > > >
>> > >
>> > > As mentioned in 2.f), I would remove the duration and calculate it
>> > > dynamically in the client code. It shouldn't be a too complex
>> operation
>> > and
>> > > allows us to keep the duration dynamic for rescales in progress.
>> > >
>> > >
>> > > > b. Rescale Overview UI.
>> > > >
>> > > > > Rescale Overview UI
>> > > > > The screenshot shows "Acquired profile" twice for the slot (based
>> on
>> > > the
>> > > > > details UI, the first one is supposed to be "required").
>> > > >
>> > > > Sorry for the typo. I corrected it.
>> > > >
>> > > > > Additionally, in
>> > > > > FLIP-495 we agreed on four metrics: previous, sufficient, desired
>> and
>> > > > > acquired resources (for parallelism and profile). Should we use
>> those
>> > > in
>> > > > > the UI as well?
>> > > >
>> > > > Okay. Updated it in the related UI draft pages.
>> > > >
>> > > > > We might want to add tooltips to the headers as well to
>> > > > > add a description for each of the metrics.
>> > > >
>> > > > > Could we add tooltips to the headers of the rescale overview to
>> > > describe
>> > > > the different IDs?
>> > > >
>> > > > Yes, the suggestion is reasonable.
>> > > > And I added the description of hint messages about some core header
>> > > > attributes after the corresponding UI draft pages.
>> > > > Looking forward to your opinion.
>> > > >
>> > > > 4. The new added items by me:
>> > > > I have added notes after some sections of the core UI pages
>> regarding
>> > > > limiting the displayed length of UUID-type identifiers and issues
>> > related
>> > > > to task names.
>> > > >
>> > > > I'd greatly appreciate any suggestions you may have.
>> > > >
>> > > >
>> > > > Best regards,
>> > > > Yuepeng Pan
>> > > >
>> > > >
>> > > > Matthias Pohl <[email protected]> 于2025年12月18日周四 18:08写道：
>> > > >
>> > > > > Hi Yuepeng,
>> > > > > I finally found some time to look into that FLIP again. Sorry for
>> the
>> > > > > delay. Thanks for working on this topic and pushing it. Here are a
>> > few
>> > > > more
>> > > > > comments on the current state of FLIP-487:
>> > > > >
>> > > > > Adaptive Scheduler will support record and query the rescale
>> history
>> > > > in[2].
>> > > > >
>> > > > > Shouldn't it have refer to reference #3, i.e. FLIP-495?
>> > > > >
>> > > > > nit: In the wiki, we do not need to add the references but use
>> links
>> > > with
>> > > > > proper link text (e.g. in the motivation paragraph). That should
>> > > improve
>> > > > > readability.
>> > > > >
>> > > > > extended schema of the response for /jobs/overview
>> > > > >
>> > > > > The extract of the schema extension is not precise: We should
>> show,
>> > > that
>> > > > > the new fields are added to the item type
>> > > > >
>> > >
>> (urn:jsonschema:org:apache:flink:runtime:messages:webmonitor:JobDetails).
>> > > > > About the field name formatting of "job-type": We still do not
>> have
>> > > this
>> > > > > one included in the code convention. But AFAIS, we usually follow
>> > > > camelCase
>> > > > > format rather kebab-casing. But especially the Job overview uses
>> both
>> > > > > already.
>> > > > >
>> > > > > Could we add tool tips to the headers of the rescale overview to
>> > > describe
>> > > > > the different IDs?
>> > > > >
>> > > > > Schema of response for /jobs/:jobid/rescales
>> > > > >
>> > > > > I noticed that also for the other JSON schemas, we jump between
>> > formats
>> > > > > (even introducing snake_casing). Let's unify them and stick to
>> > > camelCase.
>> > > > > WDYT?
>> > > > >
>> > > > > For "summary.rescaleCounts", we might not need to add the
>> "_rescales"
>> > > > > suffix to the record fields since the parent indicate already that
>> > all
>> > > of
>> > > > > the fields are rescale counts. We, therefore, could use
>> "inProgress",
>> > > > > "ignored", "completed", "failed". Do we see value in adding the
>> total
>> > > > > value? That could be easily calculated using the other four
>> metrics.
>> > > > Hence,
>> > > > > I think we can consider it as being redundant and remove it.
>> > > > >
>> > > > > "rescales_duration_stats"
>> > > > >
>> > > > > For all the "durationStats"? Can we add the time unit to make
>> things
>> > > > > clearer, e.g. "rescalesDurationStats" becomes
>> > > > > "rescalesDurationStatsInMillis"? ...same applies to the timestamps
>> > > > >
>> > > > > "ignored_rescales_duration_stats"
>> > > > >
>> > > > > Are the stats useful for rescales which were actually not
>> executed?
>> > > > >
>> > > > > duration
>> > > > >
>> > > > > Rescale details already contain the start and end time. Adding the
>> > > > duration
>> > > > > here shouldn't be necessary.
>> > > > >
>> > > > > Rescale Overview UI
>> > > > >
>> > > > >
>> > > > > The screenshot shows "Acquired profile" twice for the slot (based
>> on
>> > > the
>> > > > > details UI, the first one is supposed to be "required").
>> > Additionally,
>> > > in
>> > > > > FLIP-495 we agreed on four metrics: previous, sufficient, desired
>> and
>> > > > > acquired resources (for parallelism and profile). Should we use
>> those
>> > > in
>> > > > > the UI as well? We might want to add tool tips to the headers as
>> well
>> > > to
>> > > > > add a description for each of the metrics.
>> > > > >
>> > > > >  Rescale History UI
>> > > > >
>> > > > > The history looks nice. What making the duration of the inProgress
>> > > > rescales
>> > > > > dynamic, i.e. counting the seconds up from the start time? Keeping
>> > the
>> > > NA
>> > > > > is also fine if the dynamic approach is too complicated.
>> > > > >
>> > > > > Best,
>> > > > > Matthias
>> > > > >
>> > > > > On Wed, Nov 5, 2025 at 11:24 AM Yuepeng Pan <
>> [email protected]>
>> > > > wrote:
>> > > > >
>> > > > > > Bumping this thread. Thanks!
>> > > > > >
>> > > > > > Best regards,
>> > > > > > Yuepeng Pan
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On 2025/09/02 15:41:07 Yuepeng Pan wrote:
>> > > > > > > Hi, community.
>> > > > > > >
>> > > > > > >
>> > > > > > > At present, FLIP-495[1][2] has gone through a new round of
>> > > > discussions
>> > > > > > and a preliminary general consensus has been reached, which
>> > provides
>> > > > the
>> > > > > > necessary premise for the discussion of the current FLIP-487[3].
>> > > > > > >
>> > > > > > >
>> > > > > > > Therefore, I would like to resume the discussion on the
>> current
>> > > FLIP.
>> > > > > > >
>> > > > > > > The version of the current FLIP mainly covers and has
>> completed
>> > the
>> > > > > > following two aspects of design:
>> > > > > > > - The REST API design for querying rescale history information
>> > > > > > > - The Web UI design for showing rescale history information
>> > > > > > >
>> > > > > > >
>> > > > > > > Looking forward to your comments and suggestions.
>> > > > > > >
>> > > > > > >
>> > > > > > > [1]
>> > > https://lists.apache.org/thread/t3r9wdd5gpbqnvzw35kb3wb3d9brpnon
>> > > > > > > [2]
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-495%3A+Support+AdaptiveScheduler+record+and+query+the+rescale+history
>> > > > > > > [3]
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-487%3A+Show+history+of+rescales+in+Web+UI+for+AdaptiveScheduler
>> > > > > > >
>> > > > > > >
>> > > > > > > Best regards,
>> > > > > > > Yuepeng Pan
>> > > > > > >
>> > > > > > >
>> > > > > > > ---- Replied Message ----
>> > > > > > > | From | Matthias Pohl<[email protected]> |
>> > > > > > > | Date | 12/2/2024 16:59 |
>> > > > > > > | To | <[email protected]> |
>> > > > > > > | Subject | Re: [DISCUSS] FLIP-487: Show history of rescales
>> in
>> > Web
>> > > > UI
>> > > > > > for AdaptiveScheduler |
>> > > > > > > Hi Yuepeng,
>> > > > > > > thanks for the proposal. Having a way to see the history of
>> > > rescales
>> > > > > is a
>> > > > > > > nice feature, I guess. I went over the draft and have a few
>> > > > questions:
>> > > > > > >
>> > > > > > > Can we reorganize the draft? Right now, we have some (for
>> > > > RescaleEvent,
>> > > > > > > Required/AcquiredParallelism) schema defined in the "Proposed
>> > > > Changes"
>> > > > > > > section and some other schema under "Public Interfaces". It
>> would
>> > > be
>> > > > > nice
>> > > > > > > to have this more organized.
>> > > > > > > Just as a suggestion: In the end the proposed changes should
>> list
>> > > the
>> > > > > > > different REST endpoints you want to introduce (including the
>> > > > > > corresponding
>> > > > > > > schemas for request and response).
>> > > > > > > ---
>> > > > > > > I'm also wondering whether it would make sense to focus on the
>> > REST
>> > > > > > > endpoints in this FLIP and put the UI work in a separate FLIP.
>> > > WDYT?
>> > > > > > > Decreasing the scope would probably help handling the required
>> > > > changes.
>> > > > > > > ---
>> > > > > > > Have you considered adding the onChange event timestamp for a
>> > > rescale
>> > > > > > event
>> > > > > > > as well? We introduced a separation of the job requirements
>> > change
>> > > > > event
>> > > > > > > and the actual rescale execution in FLIP-461 [1]. It might be
>> > worth
>> > > > > > > documenting the time when a change was monitored for the first
>> > time
>> > > > > that
>> > > > > > > triggered the rescale. WDYT?
>> > > > > > > ---
>> > > > > > > You're mentioning "comments" as a field of the RescaleEvent in
>> > your
>> > > > > > > proposal. What's the use-case here? Where are these comments
>> > from?
>> > > > > > >
>> > > > > > > (update)
>> > > > > > > A brief talk with Yuepeng on that topic revealed that the
>> field
>> > is
>> > > > > > supposed
>> > > > > > > to be used for errors that occurred during the rescale
>> operation.
>> > > My
>> > > > > take
>> > > > > > > on that one:
>> > > > > > > - We might want to reconsider the field name in that case
>> (maybe
>> > > > > > > errors_during_rescale?). "comments" seems to be quite generic.
>> > > > > > > - Additionally, shouldn't we make this a list of errors rather
>> > > than a
>> > > > > > > String field?
>> > > > > > > - How certain are we that we can associate errors to the
>> actual
>> > > > rescale
>> > > > > > > operation and rather than the error being caused by something
>> > else?
>> > > > > > > ---
>> > > > > > > In the schema of the RescaleEvent you describe the three
>> > different
>> > > > > > > ID/numbers in the following way:
>> > > > > > >
>> > > > > > > The ‘id’ is automatically incremental, The rescaleAttemptId is
>> > > > > generated
>> > > > > > > based on one specified resource-requirement and the attempt
>> > number
>> > > is
>> > > > > > > generated based on rescaleAttemptId.
>> > > > > > >
>> > > > > > > But there is no "attempt number" mentioned in the RescaleEvent
>> > > > schema.
>> > > > > > > Additionally, what is the ID based on? Do we start from 0 and
>> > just
>> > > > > > > increment? Or do we want to have a mechanism that ensures that
>> > the
>> > > > IDs
>> > > > > > are
>> > > > > > > also unique/monotonically increasing after JobManager
>> failovers?
>> > > > > > > ---
>> > > > > > > For the parallelism schema: I might be misreading the draft
>> here
>> > > but
>> > > > > > you're
>> > > > > > > proposing to use the subtask name as the ID to refer to the
>> > > > JobVertex?
>> > > > > > That
>> > > > > > > the name might become quite long. What about using the
>> > JobVertexID
>> > > > > here.
>> > > > > > > That would be also more aligned to how the parallelism is
>> > > represented
>> > > > > by
>> > > > > > > the /jobs/<job-id>/resource-requirements endpoint. If we want
>> to
>> > > add
>> > > > > the
>> > > > > > > task name for readability purposes, we can still add this one
>> as
>> > a
>> > > > > > taskName
>> > > > > > > field to the Required/AcquiredParallelism schema.
>> > > > > > > ---
>> > > > > > > Status field:
>> > > > > > > - What is the meaning of "TRYING"? I guess, we're more or less
>> > > using
>> > > > > the
>> > > > > > > AdaptiveScheduler states here, aren't we? Can't we
>> align/stick to
>> > > the
>> > > > > > > naming that's defined in the AdaptiveScheduler state?
>> > > > > > > ---
>> > > > > > > Do we really need a new REST endpoint for the configuration?
>> > Can't
>> > > we
>> > > > > get
>> > > > > > > the provided information already from the existing
>> configuration
>> > > > > > endpoint?
>> > > > > > > That said, I still find it useful to have a config tab in the
>> UI
>> > at
>> > > > the
>> > > > > > end.
>> > > > > > > ---
>> > > > > > > For the summary endpoint: I see similarities to the checkpoint
>> > > > summary
>> > > > > > > here. Not sure whether you already considered that but would
>> it
>> > > make
>> > > > > > sense
>> > > > > > > to align the field names in some way to have a consistent
>> > > > > look-and-feel?
>> > > > > > > I'm also wondering whether it makes sense to align the schema
>> to
>> > > have
>> > > > > > > something like latest rescale, failed rescale, ...
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Matthias
>> > > > > > >
>> > > > > > > [1]
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler
>> > > > > > >
>> > > > > > > On Mon, Nov 25, 2024 at 11:24 AM yuanfeng hu <
>> > [email protected]>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > +1, I think this feature is very useful for adaptive
>> scheduler.
>> > > > > > >
>> > > > > > > Yuepeng Pan <[email protected]> 于2024年11月22日周五 18:38写道：
>> > > > > > >
>> > > > > > > Hi community,
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Currently, the Adaptive Scheduler already supports the REST
>> API
>> > > > > > >
>> > > > > > > to manually adjust[1] the parallelism of jobs, which enhances
>> the
>> > > > > > >
>> > > > > > > functionality of the Adaptive Scheduler.
>> > > > > > >
>> > > > > > > However, Adaptive Scheduler doesn't support displaying or
>> tracing
>> > > the
>> > > > > > > rescale history yet[2].
>> > > > > > >
>> > > > > > > This makes it inconvenient for users/devs to quickly obtain
>> some
>> > > > > internal
>> > > > > > >
>> > > > > > > information about the rescale history of the Adaptive
>> Scheduler.
>> > > > > > >
>> > > > > > > And showing the history of rescale events of
>> AdaptiveScheduler in
>> > > the
>> > > > > web
>> > > > > > >
>> > > > > > > UI is very useful for users to make the next step for jobs.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Therefore, I created the FLIP-487[3] doc to support
>> > > > > > >
>> > > > > > > 'Show history of rescales in Web UI for AdaptiveScheduler'.
>> > > > > > >
>> > > > > > > Please refer to the google document[3] for more details
>> > > > > > >
>> > > > > > > about the proposed design and implementation.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Looking forward to any feedback and opinions on this proposal.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > [1]
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management
>> > > > > > >
>> > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-22258
>> > > > > > >
>> > > > > > > [3]
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1WrLBkSkYe2tBQ3j66gKHFr2OB0d1HuHKDrRVr6B8nkM/edit?tab=t.0
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Thank you very much.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Best,
>> > > > > > >
>> > > > > > > Regards.
>> > > > > > >
>> > > > > > > Yuepeng Pan
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Best,
>> > > > > > > Yuanfeng
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-487: Show history of rescales in Web UI for AdaptiveScheduler

Reply via email to