Wow~ Thanks Matthias for resurfacing the voting thread[1]. I almost forgot about the thread I had initiated earlier. In that case, let’s proceed with the voting process based on it—there's nothing better.
Thank you for the timely reminder and your continued support as always! [1] https://lists.apache.org/thread/1j5dkz4rzzp6htbo6s1w9c2qsvfjw8to Best regards, Yuepeng Pan Matthias Pohl <[email protected]> 于2026年1月7日周三 23:29写道: > There's no need to open another voting thread. I pushed the existing one > [1] for FLIP-487 [2]. > Thanks again for driving this, Yuepeng. > > Best, > Matthias > > [1] https://lists.apache.org/thread/1j5dkz4rzzp6htbo6s1w9c2qsvfjw8to > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-487%3A+Show+history+of+rescales+in+Web+UI+for+AdaptiveScheduler > > On Tue, Jan 6, 2026 at 4:27 AM Yuepeng Pan <[email protected]> wrote: > > > Hi, community. > > > > This discussion has been ongoing for some time, and I sincerely > appreciate > > the attention and support from the developers. > > If there is no further feedback this week, I will initiate a vote next > > week. > > > > > > Best regards, > > > > Yuepeng Pan > > > > Yuepeng Pan <[email protected]> 于2026年1月5日周一 16:30写道: > > > > > Thank you, Matthias. > > > > > > > - I guess, you don't have to add the entire old section with the > > > screenshots to the Rejected alternatives. The summary paragraph is good > > > enough > > > > > > Yes, I deleted the redundant screenshots and information and kept the > > core > > > summary in paragraphs. > > > > > > > - There's a duplicated sentence under "The Web UI and REST > interfaces" > > > > > The design of the rescale history UI will follow the style of the > > > checkpoints-related pages. > > > > > But the design of the rescale history REST API will follow the > style > > > of the checkpoints-related interfaces. > > > > > > Thanks for your detailed review. > > > You are right, there're typos. > > > Updated and please let me have a try on clarifying it: > > > The original meaning what I want to express is > > > 'But the design of the rescale history REST API will not follow fully > the > > > style of the checkpoints-related interfaces.', > > > because we refactored the old interface located in the rejected edition > > > now into three new minor interfaces. > > > > > > > > > Best, > > > Yuepeng Pan > > > > > > > > > Matthias Pohl <[email protected]> 于2026年1月5日周一 15:17写道: > > > > > >> Thank you. Nothing to add from my side aside from the following > cosmetic > > >> items: > > >> - I guess, you don't have to add the entire old section with the > > >> screenshots to the Rejected alternatives. The summary paragraph is > good > > >> enough > > >> - There's a duplicated sentence under "The Web UI and REST interfaces" > > >> > The design of the rescale history UI will follow the style of the > > >> checkpoints-related pages. > > >> > But the design of the rescale history REST API will follow the style > > of > > >> the checkpoints-related interfaces. > > >> > > >> Matthias > > >> > > >> On Fri, Jan 2, 2026 at 6:19 PM Yuepeng Pan <[email protected]> > > >> wrote: > > >> > > >> > Hi, Matthias. > > >> > No worries~ and thank you very much for your comments. > > >> > > > >> > I made some adjustments based on your suggestions. > > >> > > > >> > > - The link to the sketch (section "The Web UI and REST > interfaces") > > >> could > > >> > > be removed. We should add any missing screenshots to the FLIP and > > not > > >> > rely > > >> > > on external resources. > > >> > > > >> > Deleted and all of the UI pages are pasted into the wiki page. > > >> > In the original versions, all relevant pages have already been > posted > > to > > >> > the wiki. > > >> > I have only removed the source file URLs. > > >> > > > >> > > - Maybe, add to the "Rescale Overview UI" section that the goal is > > to > > >> > have > > >> > > the rescale overview aligned with the checkpoint overview > > >> > > - For the /jobs/:jobid/rescales endpoint, splitting it up into > three > > >> > > endpoints /jobs/:jobid/rescales/{summary,history,overview} might > be > > a > > >> > good > > >> > > idea. For /config, we do it like that. But I also see the point of > > >> > keeping > > >> > > it as you proposed because we said we want to be close to what the > > >> > > checkpoint REST endpoint and UI provides. Your call - you can list > > the > > >> > > option that you didn't go for under "Rejected Alternatives" to > give > > >> more > > >> > > context around the goal that we wanted to keep the Rescale UI/REST > > API > > >> > > close to what is available for checkpoints. > > >> > > > >> > The idea you mentioned makes sense to me. > > >> > And I updated and adapted the corresponding part based on your > > opinion. > > >> > PTAL~ > > >> > > > >> > > - Under "Rescale Details UI" you added a sentence (below the > > >> screenshot) > > >> > > that feels like it should be fixed: "the items need todo keep same > > as > > >> > > mentioned Rescale Overview UI" > > >> > > > >> > Deleted. > > >> > > > >> > > - You can add a self-explanatory description for "Compatibility, > > >> > > Deprecation, and Migration Plan" (e.g. No previous work needs to > be > > >> > > considered) > > >> > > - Test Plan: REST endpoints will be tested with the RestHandler > > >> > framework. > > >> > > The UI will be tested visually through manual testing, I guess. > > >> > > > >> > Done. > > >> > > > >> > > > >> > I'd appreciate any input. > > >> > > > >> > Best regards, > > >> > Yuepeng Pan > > >> > > > >> > > > >> > Matthias Pohl via dev <[email protected]> 于2026年1月3日周六 00:15写道: > > >> > > > >> >> Looks like I mixed things up when replying to your message and it > > >> ended up > > >> >> in the wrong thread. Apologies for the confusion. See my message > > below: > > >> >> > > >> >> Happy New Year to you, too. I have nothing major to add here. Just > a > > >> few > > >> >> minor things: > > >> >> > > >> >> - The link to the sketch (section "The Web UI and REST interfaces") > > >> could > > >> >> be removed. We should add any missing screenshots to the FLIP and > not > > >> rely > > >> >> on external resources. > > >> >> - Maybe, add to the "Rescale Overview UI" section that the goal is > to > > >> have > > >> >> the rescale overview aligned with the checkpoint overview > > >> >> - For the /jobs/:jobid/rescales endpoint, splitting it up into > three > > >> >> endpoints /jobs/:jobid/rescales/{summary,history,overview} might > be a > > >> good > > >> >> idea. For /config, we do it like that. But I also see the point of > > >> keeping > > >> >> it as you proposed because we said we want to be close to what the > > >> >> checkpoint REST endpoint and UI provides. Your call - you can list > > the > > >> >> option that you didn't go for under "Rejected Alternatives" to give > > >> more > > >> >> context around the goal that we wanted to keep the Rescale UI/REST > > API > > >> >> close to what is available for checkpoints. > > >> >> - Under "Rescale Details UI" you added a sentence (below the > > >> screenshot) > > >> >> that feels like it should be fixed: "he items need todo keep same > as > > >> >> mentioned Rescale Overview UI" > > >> >> - You can add a self-explanatory description for "Compatibility, > > >> >> Deprecation, and Migration Plan" (e.g. No previous work needs to be > > >> >> considered) > > >> >> - Test Plan: REST endpoints will be tested with the RestHandler > > >> framework. > > >> >> The UI will be tested visually through manual testing, I guess. > > >> >> > > >> >> Best, > > >> >> Matthias > > >> >> > > >> >> On Wed, Dec 31, 2025 at 5:37 PM Yuepeng Pan < > [email protected]> > > >> >> wrote: > > >> >> > > >> >> > Hi, Matthias. > > >> >> > Thank you for your review and Happy New Year! > > >> >> > > > >> >> > > > >> >> > a. About JSON schema: > > >> >> > > > >> >> > > You are right. Existing fields shouldn't be modified. Only for > > new > > >> >> ones, > > >> >> > we > > >> >> > > can make sure to not introduce more inconsistencies. > > >> >> > > > >> >> > > In general, the problem is that the JSON formatting is not > > >> specified > > >> >> in > > >> >> > the > > >> >> > > coding guidelines. That's why it comes with no surprise that > > these > > >> >> > > formatting inconsistencies exist. We would need to start a > > >> discussion > > >> >> on > > >> >> > > updating the Flink coding guidelines first. Only afterwards, we > > >> could > > >> >> fix > > >> >> > > the formatting. > > >> >> > > > >> >> > > Such a change would need to be rolled out as part of a major > > >> version > > >> >> > (e.g. > > >> >> > > 3.0) only, though. > > >> >> > > > >> >> > Thanks for your confirmation & ideas. > > >> >> > That sounds good to me! > > >> >> > > > >> >> > I’ve created a new Jira ticket[1] so that community contributors > > can > > >> >> track > > >> >> > this new, independent piece of work. > > >> >> > > > >> >> > > > >> >> > b. About the durationInMillis attribute > > >> >> > > > >> >> > Thanks for your response. > > >> >> > I removed the durationInMillis from the corresponding json schema > > of > > >> >> REST > > >> >> > API interfaces and added some required description on the reason > > >> about > > >> >> the > > >> >> > deprecated 'durationInMillis'. > > >> >> > > > >> >> > > > >> >> > Any input is appreciated! > > >> >> > > > >> >> > > > >> >> > [1] https://issues.apache.org/jira/browse/FLINK-38853 > > >> >> > > > >> >> > > > >> >> > Best regards, > > >> >> > Yuepeng Pan > > >> >> > > > >> >> > > > >> >> > > > >> >> > Matthias Pohl <[email protected]> 于2025年12月31日周三 22:34写道: > > >> >> > > > >> >> > > Thanks for the quick response. I added my responses inline. > PTAL > > >> >> > > > > >> >> > > Best, > > >> >> > > Matthias > > >> >> > > > > >> >> > > On Mon, 22 Dec 2025, 01:02 Yuepeng Pan, < > [email protected]> > > >> >> wrote: > > >> >> > > > > >> >> > > > Hi, Matthias, I'm glad to see that email. > > >> >> > > > And thank you very much for your review and comments. > > >> >> > > > > > >> >> > > > To facilitate reading and discussion, > > >> >> > > > I have grouped related questions together as much as possible > > >> >> > > > when organizing my responses to your comments, > > >> >> > > > and I hope this will not cause any inconvenience. > > >> >> > > > > > >> >> > > > > > >> >> > > > 1. Reference typo & format. > > >> >> > > > > > >> >> > > > > > >> >> > > > > Adaptive Scheduler will support record and query the > rescale > > >> >> history > > >> >> > > > in[2] > > >> >> > > > > Shouldn't it have refer to reference #3, i.e. FLIP-495? > > >> >> > > > > nit: In the wiki, we do not need to add the references but > > use > > >> >> links > > >> >> > > with > > >> >> > > > > proper link text (e.g. in the motivation paragraph). That > > >> should > > >> >> > > improve > > >> >> > > > > readability. > > >> >> > > > > > >> >> > > > Thanks for the catching and suggestions. That makes sense to > > me. > > >> >> > > > I corrected and reformatted the citation errors > > >> >> > > > and reference formats you mentioned throughout the entire > > >> document. > > >> >> > > > > > >> >> > > > > > >> >> > > > 2. Schemas: > > >> >> > > > > > >> >> > > > a. schema of the response for /jobs/overview > > >> >> > > > > > >> >> > > > > extended schema of the response for /jobs/overview > > >> >> > > > > > >> >> > > > > The extract of the schema extension is not precise: We > should > > >> >> show, > > >> >> > > that > > >> >> > > > > the new fields are added to the item type > > >> >> > > > > > > >> >> > > > > >> >> > > >> > > (urn:jsonschema:org:apache:flink:runtime:messages:webmonitor:JobDetails). > > >> >> > > > > About the field name formatting of "job-type": We still do > > not > > >> >> have > > >> >> > > this > > >> >> > > > > one included in the code convention. But AFAIS, we usually > > >> follow > > >> >> > > > camelCase > > >> >> > > > > format rather kebab-casing. But especially the Job overview > > >> uses > > >> >> both > > >> >> > > > > already. > > >> >> > > > > > >> >> > > > Thanks for the comments. > > >> >> > > > That sounds good to me. > > >> >> > > > I have updated the corresponding accompanying changes to the > > >> >> JobDetails > > >> >> > > > class. > > >> >> > > > > > >> >> > > > b. schema of response for /jobs/:jobid/rescales > > >> >> > > > > > >> >> > > > > Schema of response for /jobs/:jobid/rescales > > >> >> > > > > I noticed that also for the other JSON schemas, we jump > > between > > >> >> > formats > > >> >> > > > > (even introducing snake_casing). Let's unify them and stick > > to > > >> >> > > camelCase. > > >> >> > > > > WDYT? > > >> >> > > > > > >> >> > > > Nice idea! > > >> >> > > > Considering compatibility and the workload associated with > this > > >> >> FLIP, > > >> >> > > > the existing fields are not modified in the current FLIP, > > >> >> > > > only the newly introduced fields are named > > >> >> > > > following the camelCase naming convention. > > >> >> > > > And I updated the lines about schemas that need to change. > > >> >> > > > > >> >> > > > > >> >> > > > Regarding the naming style changes for all fields in schemas > > that > > >> >> are > > >> >> > > > modified (as opposed to newly introduced) within this FLIP, > do > > we > > >> >> need > > >> >> > a > > >> >> > > > new FLIP to address and unify such work? > > >> >> > > > This way, the new FLIP would focus solely on this type of > task. > > >> >> > > > What do you think about it ? > > >> >> > > > > > >> >> > > > > >> >> > > You are right. Existing fields shouldn't be modified. Only for > > new > > >> >> ones, > > >> >> > we > > >> >> > > can make sure to not introduce more inconsistencies. > > >> >> > > > > >> >> > > In general, the problem is that the JSON formatting is not > > >> specified > > >> >> in > > >> >> > the > > >> >> > > coding guidelines. That's why it comes with no surprise that > > these > > >> >> > > formatting inconsistencies exist. We would need to start a > > >> discussion > > >> >> on > > >> >> > > updating the Flink coding guidelines first. Only afterwards, we > > >> could > > >> >> fix > > >> >> > > the formatting. > > >> >> > > > > >> >> > > Such a change would need to be rolled out as part of a major > > >> version > > >> >> > (e.g. > > >> >> > > 3.0) only, though. > > >> >> > > > > >> >> > > > > >> >> > > > c. For "summary.rescaleCounts" > > >> >> > > > > > >> >> > > > > For "summary.rescaleCounts", we might not need to add the > > >> >> "_rescales" > > >> >> > > > > suffix to the record fields since the parent indicates > > already > > >> >> that > > >> >> > all > > >> >> > > > of > > >> >> > > > > the fields are rescale counts. We, therefore, could use > > >> >> "inProgress", > > >> >> > > > > "ignored", "completed", "failed". > > >> >> > > > > > >> >> > > > Yes, this indeed makes the expression more concise and to the > > >> point. > > >> >> > > > I updated this part. > > >> >> > > > > > >> >> > > > > Do we see value in adding the total > > >> >> > > > > value? That could be easily calculated using the other four > > >> >> metrics. > > >> >> > > > Hence, > > >> >> > > > > I think we can consider it as being redundant and remove > it. > > >> >> > > > > > >> >> > > > This is acceptable, as the one of differences lies in > > >> >> > > > whether the total value is calculated on the FE side or on > the > > >> >> backend. > > >> >> > > > > > >> >> > > > d. rescalesDurationStats/rescales_duration_stats(the previous > > >> >> edition) > > >> >> > > > > > >> >> > > > > "rescales_duration_stats" > > >> >> > > > > For all the "durationStats"? Can we add the time unit to > make > > >> >> things > > >> >> > > > > clearer, e.g. "rescalesDurationStats" becomes > > >> >> > > > > "rescalesDurationStatsInMillis"? ...same applies to the > > >> timestamps > > >> >> > > > > > >> >> > > > Good idea~. > > >> >> > > > I update the description of all attributes about timestamps. > > >> >> > > > Please help take a look! > > >> >> > > > > > >> >> > > > e. > > >> ignoredRescalesDurationStats/ignored_rescales_duration_stats(the > > >> >> > > > previous edition) > > >> >> > > > > > >> >> > > > > "ignored_rescales_duration_stats" > > >> >> > > > > Are the stats useful for rescales which were actually not > > >> >> executed? > > >> >> > > > > > >> >> > > > Answering this question may be a bit difficult for me. > > >> >> > > > In theory, since rescale operations of the Ignored type can > > >> occur, > > >> >> > > > it is reasonable to include them in the statistics—at least > > >> >> > > > from the perspective of having a complete set of dimensions. > > >> >> > > > In addition, I'm not certain whether users truly do not care > > >> >> > > > about statistics for this type of data. > > >> >> > > > Therefore, I kept it in the initial design document. > > >> >> > > > If you think it is unnecessary to retain this data, > > >> >> > > > we can exclude Ignored rescale types from the duration > > >> statistics. > > >> >> > > > I would appreciate your experience and opinion on this. > > >> >> > > > > >> >> > > > > >> >> > > Fair enough. > > >> >> > > > > >> >> > > f. the durationInMillis attribute. > > >> >> > > > > >> >> > > > > >> >> > > > > duration > > >> >> > > > > Rescale details already contain the start and end time. > > Adding > > >> the > > >> >> > > > duration > > >> >> > > > > here shouldn't be necessary. > > >> >> > > > > > >> >> > > > If the frontend page does not involve overly complex display > > >> logic, > > >> >> > > > adding an additional durationInMillis field here should be > > >> >> unnecessary. > > >> >> > > > > > >> >> > > > > >> >> > > Just to clarify: I don't suggest removing the duration > > information > > >> >> from > > >> >> > the > > >> >> > > web UI. It's only obsolete in the REST API because it can be > > >> >> calculated > > >> >> > on > > >> >> > > the client side. > > >> >> > > > > >> >> > > > > >> >> > > > > > >> >> > > > 3. UI > > >> >> > > > > > >> >> > > > a. Rescale History UI(related to 'durationInMillis' > attribute) > > >> >> > > > > > >> >> > > > > Rescale History UI > > >> >> > > > > The history looks nice. What making the duration of the > > >> inProgress > > >> >> > > > rescales > > >> >> > > > > dynamic, i.e. counting the seconds up from the start time? > > >> Keeping > > >> >> > the > > >> >> > > NA > > >> >> > > > > is also fine if the dynamic approach is too complicated. > > >> >> > > > > > >> >> > > > In my limited reading, > > >> >> > > > this is feasible from an implementation perspective, > > >> >> > > > though it may require some adjustments. > > >> >> > > > If we remove the durationInMillis field from rescale, > > >> >> > > > the frontend would need to perform some additional processing > > >> when > > >> >> > > > displaying the data. > > >> >> > > > For example: > > >> >> > > > rescale{terminalState=inProgress, startTimestampInMillis=1, > > >> >> > > > endTimestampInMillis=null, durationInMillis=3} > > >> >> > > > If we keep the durationInMillis field, the frontend would > > almost > > >> not > > >> >> > need > > >> >> > > > any logic and could simply display the data as is. > > >> >> > > > If we do not keep the durationInMillis field, the frontend > > would > > >> >> need > > >> >> > to > > >> >> > > do > > >> >> > > > two things when rendering: > > >> >> > > > - Calculate durationInMillis based on > startTimestampInMillis > > >> and > > >> >> > > > endTimestampInMillis > > >> >> > > > - When displaying records with terminalState = inProgress, > > show > > >> >> > > > endTimestampInMillis as null > > >> >> > > > > > >> >> > > > Similarly, for handling durationInMillis in schedulerState, > > >> >> > > > I‘m not sure whether such scenarios would arise, > > >> >> > > > although we have not yet considered > > >> >> > > > whether this data should be displayed in the same way as > > >> >> > > > Rescale.durationInMillis. > > >> >> > > > Although the difference is small, > > >> >> > > > it is worth clarifying so that we can better evaluate the > > >> decision. > > >> >> > > > > > >> >> > > > Therefore, please let me know your thoughts on > > >> >> > > > - whether we should keep the durationInMillis field for both > > >> Rescale > > >> >> > and > > >> >> > > > schedulerState in the schema > > >> >> > > > - Show N.A in the duration of InProgress Rescale and remove > the > > >> >> > > > durationInMillis in the related sub-json. > > >> >> > > > - Or something reasonable from you. > > >> >> > > > > > >> >> > > > > >> >> > > As mentioned in 2.f), I would remove the duration and calculate > > it > > >> >> > > dynamically in the client code. It shouldn't be a too complex > > >> >> operation > > >> >> > and > > >> >> > > allows us to keep the duration dynamic for rescales in > progress. > > >> >> > > > > >> >> > > > > >> >> > > > b. Rescale Overview UI. > > >> >> > > > > > >> >> > > > > Rescale Overview UI > > >> >> > > > > The screenshot shows "Acquired profile" twice for the slot > > >> (based > > >> >> on > > >> >> > > the > > >> >> > > > > details UI, the first one is supposed to be "required"). > > >> >> > > > > > >> >> > > > Sorry for the typo. I corrected it. > > >> >> > > > > > >> >> > > > > Additionally, in > > >> >> > > > > FLIP-495 we agreed on four metrics: previous, sufficient, > > >> desired > > >> >> and > > >> >> > > > > acquired resources (for parallelism and profile). Should we > > use > > >> >> those > > >> >> > > in > > >> >> > > > > the UI as well? > > >> >> > > > > > >> >> > > > Okay. Updated it in the related UI draft pages. > > >> >> > > > > > >> >> > > > > We might want to add tooltips to the headers as well to > > >> >> > > > > add a description for each of the metrics. > > >> >> > > > > > >> >> > > > > Could we add tooltips to the headers of the rescale > overview > > to > > >> >> > > describe > > >> >> > > > the different IDs? > > >> >> > > > > > >> >> > > > Yes, the suggestion is reasonable. > > >> >> > > > And I added the description of hint messages about some core > > >> header > > >> >> > > > attributes after the corresponding UI draft pages. > > >> >> > > > Looking forward to your opinion. > > >> >> > > > > > >> >> > > > 4. The new added items by me: > > >> >> > > > I have added notes after some sections of the core UI pages > > >> >> regarding > > >> >> > > > limiting the displayed length of UUID-type identifiers and > > issues > > >> >> > related > > >> >> > > > to task names. > > >> >> > > > > > >> >> > > > I'd greatly appreciate any suggestions you may have. > > >> >> > > > > > >> >> > > > > > >> >> > > > Best regards, > > >> >> > > > Yuepeng Pan > > >> >> > > > > > >> >> > > > > > >> >> > > > Matthias Pohl <[email protected]> 于2025年12月18日周四 18:08写道: > > >> >> > > > > > >> >> > > > > Hi Yuepeng, > > >> >> > > > > I finally found some time to look into that FLIP again. > Sorry > > >> for > > >> >> the > > >> >> > > > > delay. Thanks for working on this topic and pushing it. > Here > > >> are a > > >> >> > few > > >> >> > > > more > > >> >> > > > > comments on the current state of FLIP-487: > > >> >> > > > > > > >> >> > > > > Adaptive Scheduler will support record and query the > rescale > > >> >> history > > >> >> > > > in[2]. > > >> >> > > > > > > >> >> > > > > Shouldn't it have refer to reference #3, i.e. FLIP-495? > > >> >> > > > > > > >> >> > > > > nit: In the wiki, we do not need to add the references but > > use > > >> >> links > > >> >> > > with > > >> >> > > > > proper link text (e.g. in the motivation paragraph). That > > >> should > > >> >> > > improve > > >> >> > > > > readability. > > >> >> > > > > > > >> >> > > > > extended schema of the response for /jobs/overview > > >> >> > > > > > > >> >> > > > > The extract of the schema extension is not precise: We > should > > >> >> show, > > >> >> > > that > > >> >> > > > > the new fields are added to the item type > > >> >> > > > > > > >> >> > > > > >> >> > > >> > > (urn:jsonschema:org:apache:flink:runtime:messages:webmonitor:JobDetails). > > >> >> > > > > About the field name formatting of "job-type": We still do > > not > > >> >> have > > >> >> > > this > > >> >> > > > > one included in the code convention. But AFAIS, we usually > > >> follow > > >> >> > > > camelCase > > >> >> > > > > format rather kebab-casing. But especially the Job overview > > >> uses > > >> >> both > > >> >> > > > > already. > > >> >> > > > > > > >> >> > > > > Could we add tool tips to the headers of the rescale > overview > > >> to > > >> >> > > describe > > >> >> > > > > the different IDs? > > >> >> > > > > > > >> >> > > > > Schema of response for /jobs/:jobid/rescales > > >> >> > > > > > > >> >> > > > > I noticed that also for the other JSON schemas, we jump > > between > > >> >> > formats > > >> >> > > > > (even introducing snake_casing). Let's unify them and stick > > to > > >> >> > > camelCase. > > >> >> > > > > WDYT? > > >> >> > > > > > > >> >> > > > > For "summary.rescaleCounts", we might not need to add the > > >> >> "_rescales" > > >> >> > > > > suffix to the record fields since the parent indicate > already > > >> that > > >> >> > all > > >> >> > > of > > >> >> > > > > the fields are rescale counts. We, therefore, could use > > >> >> "inProgress", > > >> >> > > > > "ignored", "completed", "failed". Do we see value in adding > > the > > >> >> total > > >> >> > > > > value? That could be easily calculated using the other four > > >> >> metrics. > > >> >> > > > Hence, > > >> >> > > > > I think we can consider it as being redundant and remove > it. > > >> >> > > > > > > >> >> > > > > "rescales_duration_stats" > > >> >> > > > > > > >> >> > > > > For all the "durationStats"? Can we add the time unit to > make > > >> >> things > > >> >> > > > > clearer, e.g. "rescalesDurationStats" becomes > > >> >> > > > > "rescalesDurationStatsInMillis"? ...same applies to the > > >> timestamps > > >> >> > > > > > > >> >> > > > > "ignored_rescales_duration_stats" > > >> >> > > > > > > >> >> > > > > Are the stats useful for rescales which were actually not > > >> >> executed? > > >> >> > > > > > > >> >> > > > > duration > > >> >> > > > > > > >> >> > > > > Rescale details already contain the start and end time. > > Adding > > >> the > > >> >> > > > duration > > >> >> > > > > here shouldn't be necessary. > > >> >> > > > > > > >> >> > > > > Rescale Overview UI > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > The screenshot shows "Acquired profile" twice for the slot > > >> (based > > >> >> on > > >> >> > > the > > >> >> > > > > details UI, the first one is supposed to be "required"). > > >> >> > Additionally, > > >> >> > > in > > >> >> > > > > FLIP-495 we agreed on four metrics: previous, sufficient, > > >> desired > > >> >> and > > >> >> > > > > acquired resources (for parallelism and profile). Should we > > use > > >> >> those > > >> >> > > in > > >> >> > > > > the UI as well? We might want to add tool tips to the > headers > > >> as > > >> >> well > > >> >> > > to > > >> >> > > > > add a description for each of the metrics. > > >> >> > > > > > > >> >> > > > > Rescale History UI > > >> >> > > > > > > >> >> > > > > The history looks nice. What making the duration of the > > >> inProgress > > >> >> > > > rescales > > >> >> > > > > dynamic, i.e. counting the seconds up from the start time? > > >> Keeping > > >> >> > the > > >> >> > > NA > > >> >> > > > > is also fine if the dynamic approach is too complicated. > > >> >> > > > > > > >> >> > > > > Best, > > >> >> > > > > Matthias > > >> >> > > > > > > >> >> > > > > On Wed, Nov 5, 2025 at 11:24 AM Yuepeng Pan < > > >> >> [email protected]> > > >> >> > > > wrote: > > >> >> > > > > > > >> >> > > > > > Bumping this thread. Thanks! > > >> >> > > > > > > > >> >> > > > > > Best regards, > > >> >> > > > > > Yuepeng Pan > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > On 2025/09/02 15:41:07 Yuepeng Pan wrote: > > >> >> > > > > > > Hi, community. > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > At present, FLIP-495[1][2] has gone through a new round > > of > > >> >> > > > discussions > > >> >> > > > > > and a preliminary general consensus has been reached, > which > > >> >> > provides > > >> >> > > > the > > >> >> > > > > > necessary premise for the discussion of the current > > >> FLIP-487[3]. > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > Therefore, I would like to resume the discussion on the > > >> >> current > > >> >> > > FLIP. > > >> >> > > > > > > > > >> >> > > > > > > The version of the current FLIP mainly covers and has > > >> >> completed > > >> >> > the > > >> >> > > > > > following two aspects of design: > > >> >> > > > > > > - The REST API design for querying rescale history > > >> information > > >> >> > > > > > > - The Web UI design for showing rescale history > > information > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > Looking forward to your comments and suggestions. > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > [1] > > >> >> > > > https://lists.apache.org/thread/t3r9wdd5gpbqnvzw35kb3wb3d9brpnon > > >> >> > > > > > > [2] > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-495%3A+Support+AdaptiveScheduler+record+and+query+the+rescale+history > > >> >> > > > > > > [3] > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-487%3A+Show+history+of+rescales+in+Web+UI+for+AdaptiveScheduler > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > Best regards, > > >> >> > > > > > > Yuepeng Pan > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > ---- Replied Message ---- > > >> >> > > > > > > | From | Matthias Pohl<[email protected]> | > > >> >> > > > > > > | Date | 12/2/2024 16:59 | > > >> >> > > > > > > | To | <[email protected]> | > > >> >> > > > > > > | Subject | Re: [DISCUSS] FLIP-487: Show history of > > >> rescales > > >> >> in > > >> >> > Web > > >> >> > > > UI > > >> >> > > > > > for AdaptiveScheduler | > > >> >> > > > > > > Hi Yuepeng, > > >> >> > > > > > > thanks for the proposal. Having a way to see the > history > > of > > >> >> > > rescales > > >> >> > > > > is a > > >> >> > > > > > > nice feature, I guess. I went over the draft and have a > > few > > >> >> > > > questions: > > >> >> > > > > > > > > >> >> > > > > > > Can we reorganize the draft? Right now, we have some > (for > > >> >> > > > RescaleEvent, > > >> >> > > > > > > Required/AcquiredParallelism) schema defined in the > > >> "Proposed > > >> >> > > > Changes" > > >> >> > > > > > > section and some other schema under "Public > Interfaces". > > It > > >> >> would > > >> >> > > be > > >> >> > > > > nice > > >> >> > > > > > > to have this more organized. > > >> >> > > > > > > Just as a suggestion: In the end the proposed changes > > >> should > > >> >> list > > >> >> > > the > > >> >> > > > > > > different REST endpoints you want to introduce > (including > > >> the > > >> >> > > > > > corresponding > > >> >> > > > > > > schemas for request and response). > > >> >> > > > > > > --- > > >> >> > > > > > > I'm also wondering whether it would make sense to focus > > on > > >> the > > >> >> > REST > > >> >> > > > > > > endpoints in this FLIP and put the UI work in a > separate > > >> FLIP. > > >> >> > > WDYT? > > >> >> > > > > > > Decreasing the scope would probably help handling the > > >> required > > >> >> > > > changes. > > >> >> > > > > > > --- > > >> >> > > > > > > Have you considered adding the onChange event timestamp > > >> for a > > >> >> > > rescale > > >> >> > > > > > event > > >> >> > > > > > > as well? We introduced a separation of the job > > requirements > > >> >> > change > > >> >> > > > > event > > >> >> > > > > > > and the actual rescale execution in FLIP-461 [1]. It > > might > > >> be > > >> >> > worth > > >> >> > > > > > > documenting the time when a change was monitored for > the > > >> first > > >> >> > time > > >> >> > > > > that > > >> >> > > > > > > triggered the rescale. WDYT? > > >> >> > > > > > > --- > > >> >> > > > > > > You're mentioning "comments" as a field of the > > >> RescaleEvent in > > >> >> > your > > >> >> > > > > > > proposal. What's the use-case here? Where are these > > >> comments > > >> >> > from? > > >> >> > > > > > > > > >> >> > > > > > > (update) > > >> >> > > > > > > A brief talk with Yuepeng on that topic revealed that > the > > >> >> field > > >> >> > is > > >> >> > > > > > supposed > > >> >> > > > > > > to be used for errors that occurred during the rescale > > >> >> operation. > > >> >> > > My > > >> >> > > > > take > > >> >> > > > > > > on that one: > > >> >> > > > > > > - We might want to reconsider the field name in that > case > > >> >> (maybe > > >> >> > > > > > > errors_during_rescale?). "comments" seems to be quite > > >> generic. > > >> >> > > > > > > - Additionally, shouldn't we make this a list of errors > > >> rather > > >> >> > > than a > > >> >> > > > > > > String field? > > >> >> > > > > > > - How certain are we that we can associate errors to > the > > >> >> actual > > >> >> > > > rescale > > >> >> > > > > > > operation and rather than the error being caused by > > >> something > > >> >> > else? > > >> >> > > > > > > --- > > >> >> > > > > > > In the schema of the RescaleEvent you describe the > three > > >> >> > different > > >> >> > > > > > > ID/numbers in the following way: > > >> >> > > > > > > > > >> >> > > > > > > The ‘id’ is automatically incremental, The > > >> rescaleAttemptId is > > >> >> > > > > generated > > >> >> > > > > > > based on one specified resource-requirement and the > > attempt > > >> >> > number > > >> >> > > is > > >> >> > > > > > > generated based on rescaleAttemptId. > > >> >> > > > > > > > > >> >> > > > > > > But there is no "attempt number" mentioned in the > > >> RescaleEvent > > >> >> > > > schema. > > >> >> > > > > > > Additionally, what is the ID based on? Do we start > from 0 > > >> and > > >> >> > just > > >> >> > > > > > > increment? Or do we want to have a mechanism that > ensures > > >> that > > >> >> > the > > >> >> > > > IDs > > >> >> > > > > > are > > >> >> > > > > > > also unique/monotonically increasing after JobManager > > >> >> failovers? > > >> >> > > > > > > --- > > >> >> > > > > > > For the parallelism schema: I might be misreading the > > draft > > >> >> here > > >> >> > > but > > >> >> > > > > > you're > > >> >> > > > > > > proposing to use the subtask name as the ID to refer to > > the > > >> >> > > > JobVertex? > > >> >> > > > > > That > > >> >> > > > > > > the name might become quite long. What about using the > > >> >> > JobVertexID > > >> >> > > > > here. > > >> >> > > > > > > That would be also more aligned to how the parallelism > is > > >> >> > > represented > > >> >> > > > > by > > >> >> > > > > > > the /jobs/<job-id>/resource-requirements endpoint. If > we > > >> want > > >> >> to > > >> >> > > add > > >> >> > > > > the > > >> >> > > > > > > task name for readability purposes, we can still add > this > > >> one > > >> >> as > > >> >> > a > > >> >> > > > > > taskName > > >> >> > > > > > > field to the Required/AcquiredParallelism schema. > > >> >> > > > > > > --- > > >> >> > > > > > > Status field: > > >> >> > > > > > > - What is the meaning of "TRYING"? I guess, we're more > or > > >> less > > >> >> > > using > > >> >> > > > > the > > >> >> > > > > > > AdaptiveScheduler states here, aren't we? Can't we > > >> >> align/stick to > > >> >> > > the > > >> >> > > > > > > naming that's defined in the AdaptiveScheduler state? > > >> >> > > > > > > --- > > >> >> > > > > > > Do we really need a new REST endpoint for the > > >> configuration? > > >> >> > Can't > > >> >> > > we > > >> >> > > > > get > > >> >> > > > > > > the provided information already from the existing > > >> >> configuration > > >> >> > > > > > endpoint? > > >> >> > > > > > > That said, I still find it useful to have a config tab > in > > >> the > > >> >> UI > > >> >> > at > > >> >> > > > the > > >> >> > > > > > end. > > >> >> > > > > > > --- > > >> >> > > > > > > For the summary endpoint: I see similarities to the > > >> checkpoint > > >> >> > > > summary > > >> >> > > > > > > here. Not sure whether you already considered that but > > >> would > > >> >> it > > >> >> > > make > > >> >> > > > > > sense > > >> >> > > > > > > to align the field names in some way to have a > consistent > > >> >> > > > > look-and-feel? > > >> >> > > > > > > I'm also wondering whether it makes sense to align the > > >> schema > > >> >> to > > >> >> > > have > > >> >> > > > > > > something like latest rescale, failed rescale, ... > > >> >> > > > > > > > > >> >> > > > > > > Best, > > >> >> > > > > > > Matthias > > >> >> > > > > > > > > >> >> > > > > > > [1] > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler > > >> >> > > > > > > > > >> >> > > > > > > On Mon, Nov 25, 2024 at 11:24 AM yuanfeng hu < > > >> >> > [email protected]> > > >> >> > > > > > wrote: > > >> >> > > > > > > > > >> >> > > > > > > +1, I think this feature is very useful for adaptive > > >> >> scheduler. > > >> >> > > > > > > > > >> >> > > > > > > Yuepeng Pan <[email protected]> 于2024年11月22日周五 > > >> 18:38写道: > > >> >> > > > > > > > > >> >> > > > > > > Hi community, > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > Currently, the Adaptive Scheduler already supports the > > REST > > >> >> API > > >> >> > > > > > > > > >> >> > > > > > > to manually adjust[1] the parallelism of jobs, which > > >> enhances > > >> >> the > > >> >> > > > > > > > > >> >> > > > > > > functionality of the Adaptive Scheduler. > > >> >> > > > > > > > > >> >> > > > > > > However, Adaptive Scheduler doesn't support displaying > or > > >> >> tracing > > >> >> > > the > > >> >> > > > > > > rescale history yet[2]. > > >> >> > > > > > > > > >> >> > > > > > > This makes it inconvenient for users/devs to quickly > > obtain > > >> >> some > > >> >> > > > > internal > > >> >> > > > > > > > > >> >> > > > > > > information about the rescale history of the Adaptive > > >> >> Scheduler. > > >> >> > > > > > > > > >> >> > > > > > > And showing the history of rescale events of > > >> >> AdaptiveScheduler in > > >> >> > > the > > >> >> > > > > web > > >> >> > > > > > > > > >> >> > > > > > > UI is very useful for users to make the next step for > > jobs. > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > Therefore, I created the FLIP-487[3] doc to support > > >> >> > > > > > > > > >> >> > > > > > > 'Show history of rescales in Web UI for > > AdaptiveScheduler'. > > >> >> > > > > > > > > >> >> > > > > > > Please refer to the google document[3] for more details > > >> >> > > > > > > > > >> >> > > > > > > about the proposed design and implementation. > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > Looking forward to any feedback and opinions on this > > >> proposal. > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > [1] > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management > > >> >> > > > > > > > > >> >> > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-22258 > > >> >> > > > > > > > > >> >> > > > > > > [3] > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > https://docs.google.com/document/d/1WrLBkSkYe2tBQ3j66gKHFr2OB0d1HuHKDrRVr6B8nkM/edit?tab=t.0 > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > Thank you very much. > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > Best, > > >> >> > > > > > > > > >> >> > > > > > > Regards. > > >> >> > > > > > > > > >> >> > > > > > > Yuepeng Pan > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > -- > > >> >> > > > > > > Best, > > >> >> > > > > > > Yuanfeng > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > >> > > > > > >
