Thanks Matthias for the comments. >- How is the resourceRequirementsEpochID generated?
Please let me have a try on clarifying it. Each rescale requirement corresponds to a resourceRequirementsEpochID. A new value(UUID) will be generated for resourceRequirementsEpochID when the job is first started and each time an update resource requirements request is received from the REST API. >- Why is the Resource ID section "hidden" (at least, I don't have access) >in GoogleDocs and not added to the FLIP? Sorry for the typo. I corrected it in the wiki page. >- Can you add more details of where this ID is coming from? Added it~. >Slots section >- I was surprised that we have a name for the SlotSharingGroup. Then I >realized that we have 3 different implementations of the SlotSharingGroup >in Flink. The one that's used in the scheduler doesn't have the name >preserved but works with SlotSharingGroupIds. That, we might need to >consider. If we pass down the name (which might make sense for the UI), we >still want to expose the SlotSharingGroupId as well, I guess. Yes, In simple terms, the group name is user-oriented, while the group ID is engine-oriented. And adding the group name would indeed be friendly to users and easy to use. >- What about exposing the ResourceProfile of the SlotSharingGroup here as >Well? That makes sense to me! And I updated it in wiki FLIP. >Rescale Event/Rescale status sections >- I'm not sure about the AdaptiveScheduler state to Rescale event state >mapping that's included in the FLIP right now: Triggering rescaling only >happens in the Executing state of the AdaptiveScheduler right now. Waiting >for resources also happens while the job is running (i.e. Executing state). >The AdaptiveScheduler will immediately transition from Executing to >CreatingExecutionGraph state in case of rescaling (WaitingForResources is >omitted). This was introduced FLIP-472 [3] > >- I'm wondering whether we can rely on the StateTransitionManager here >(which was also introduced with FLIP-472 [3]). That instance is coupled >with the Executing state (aside from WaitingForResources where it serves a >different purpose) and holds the information about the rescale trigger >event (and subsequent ignored rescale trigger events) and when the >rescaling was actually initiated. There might not be a need to work Thank you very much for the reminding The proposal makes sense to me. Additionally, I'd like to confirm whether each rescale cycle/event requires a status field, such as FAILED, IGNORED, SUCCESS, PENDING, etc. If such state fields are not needed, how do we record that a particular rescale request was ignored? Or do we not care about this situation and only plan to record successful rescale events? >The first two are the resource configurations the rescale decision is based >on. The last two are the actual applied resource configurations. Keep in >mind that the latter two are not necessarily matching the resource >configurations that were considered when deciding on the rescaling. >Especially the case where the desired resources were met when rescaling was >triggered but where task slots are lost while rescaling can have a >surprising outcome. We might want to have this reflected in the rescale >Event. Yes, this could happen during the reserving slots phase, and it's important to record this. In my limited read, this is feasible. We can collect the exceptions from this phase while gathering the scheduler state history, and record the specific information using the previously mentioned exception field and comments field, or use failed status as the final status of the rescale event/record, WDYTA? >How/Where to store rescale events section >- It makes sense to have the rescale event history be stored in the >AdaptiveScheduler (analogously to what is done for the exception history). >But can you elaborate a bit more on the different approaches (in-memory, on >disk, DFS). Each of them have different outcome (in-memory: the history is >gone as soon as the job reaches a globally-terminal state; on disk: rescale >history survives the job termination; DFS: rescale history survives a JM >failover). Thanks for the comment. I updated the wiki in the corresponding sections based on the exception history mechanism. Please take a look~ > I feel like on disk approach (analogously to the exception >history) makes the most sense here. WDYT? Sorry,Matthias, IIUC,. If the storage mechanism here is similar to that of the exception history, then we should choose the DFS approach, such as HDFS. Please correct me if I’m wrong. BTW, the subsequent FLIP content will be maintained in the wiki page, and the version in Google Docs will be deprecated. Thank you. Best, Yuepeng Pan On 2024/12/18 08:49:32 Matthias Pohl wrote: > Hi Yuepeng, > Sorry for not finding the time to respond earlier. I went over FLIP-495 [1] > and the previous FLIP-487 discussion [2]. Thanks for putting it all > together in a FLIP. That makes it easier to discuss the next iteration. > Here are a few comments I have: > > Rescale ID section > - How is the resourceRequirementsEpochID generated? > - Why is the Resource ID section "hidden" (at least, I don't have access) > in GoogleDocs and not added to the FLIP? > - Can you add more details of where this ID is coming from? > > Slots section > - I was surprised that we have a name for the SlotSharingGroup. Then I > realized that we have 3 different implementations of the SlotSharingGroup > in Flink. The one that's used in the scheduler doesn't have the name > preserved but works with SlotSharingGroupIds. That, we might need to > consider. If we pass down the name (which might make sense for the UI), we > still want to expose the SlotSharingGroupId as well, I guess. > - What about exposing the ResourceProfile of the SlotSharingGroup here as > well? > > Rescale Event/Rescale status sections > - I'm not sure about the AdaptiveScheduler state to Rescale event state > mapping that's included in the FLIP right now: Triggering rescaling only > happens in the Executing state of the AdaptiveScheduler right now. Waiting > for resources also happens while the job is running (i.e. Executing state). > The AdaptiveScheduler will immediately transition from Executing to > CreatingExecutionGraph state in case of rescaling (WaitingForResources is > omitted). This was introduced FLIP-472 [3] > > - I'm wondering whether we can rely on the StateTransitionManager here > (which was also introduced with FLIP-472 [3]). That instance is coupled > with the Executing state (aside from WaitingForResources where it serves a > different purpose) and holds the information about the rescale trigger > event (and subsequent ignored rescale trigger events) and when the > rescaling was actually initiated. There might not be a need to work > > - I also want to point out that we have four different notions of resource > configurations: > - Desired resources: The ideal resource configuration that we want to > achieve for a job if enough Task slots are available (essentially the upper > bound of the job's parallelism) > - Sufficient resources: A minimum resource configuration that the job can > run on (the lower bound of the job's parallelism) > - Current resources: The resource configuration the job runs on before > rescaling > - Follow-up resources > The first two are the resource configurations the rescale decision is based > on. The last two are the actual applied resource configurations. Keep in > mind that the latter two are not necessarily matching the resource > configurations that were considered when deciding on the rescaling. > Especially the case where the desired resources were met when rescaling was > triggered but where task slots are lost while rescaling can have a > surprising outcome. We might want to have this reflected in the rescale > event. > > How/Where to store rescale events section > - It makes sense to have the rescale event history be stored in the > AdaptiveScheduler (analogously to what is done for the exception history). > But can you elaborate a bit more on the different approaches (in-memory, on > disk, DFS). Each of them have different outcome (in-memory: the history is > gone as soon as the job reaches a globally-terminal state; on disk: rescale > history survives the job termination; DFS: rescale history survives a JM > failover). I feel like on disk approach (analogously to the exception > history) makes the most sense here. WDYT? > > Best, > Matthias > > [1] > https://cwiki.apache.org/confluence/display/FLINK/%5BWIP%5D+FLIP-495%3A+Support+AdaptiveScheduler+record+and+query+the+rescale+history > [2] https://lists.apache.org/thread/f4md4btkf006mxcxf66bng1kfz0rsn8c > [3] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-472%3A+Aligning+timeout+logic+in+the+AdaptiveScheduler%27s+WaitingForResources+and+Executing+states > > On Tue, 17 Dec 2024, 16:21 Yuepeng Pan, <panyuep...@apache.org> wrote: > > > Hi community, > > > > > > > > > > We discussed several aspects of FLIP-487[1] 'Show history of rescales in > > Web UI for AdaptiveScheduler' > > and received a lot of valuable feedback. Based on the suggestions from the > > email thread[2], > > we plan to split the original proposal for FLIP-487[1]. > > > > > > > > > > The current email thread and the FLIP-495[3] wiki will be used to discuss > > 'Support AdaptiveScheduler in recording and querying the rescale history', > > while FLIP-487[1] will primarily focus on displaying-related design content > > > > > > > > > > Looking forward to any feedback and opinions on FLIP-495[3]. > > > > > > > > > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/%5BWIP%5D+FLIP-487%3A+Show+history+of+rescales+in+Web+UI+for+AdaptiveScheduler > > > > [2] https://lists.apache.org/thread/f4md4btkf006mxcxf66bng1kfz0rsn8c > > > > [3] > > https://cwiki.apache.org/confluence/display/FLINK/%5BWIP%5D+FLIP-495%3A+Support+AdaptiveScheduler+record+and+query+the+rescale+history > > > > > > > > > > Thank you very much. > > > > > > > > > > Best, > > > > Regards. > > > > Yuepeng Pan >