Re: [PR] [FLINK-33977][runtime] Adaptive scheduler may not minimize the number of TMs during downscaling [flink]

via GitHub Mon, 23 Sep 2024 18:38:00 -0700


1996fanrui commented on code in PR #25218:
URL: https://github.com/apache/flink/pull/25218#discussion_r1772439538



##########
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adaptive/allocator/StateLocalitySlotAssigner.java:
##########
@@ -139,6 +154,36 @@ public Collection<SlotAssignment> assignSlots(
         return assignments;
     }
 
+    /**
+     * The sorting principle and strategy here are very similar to {@link

Review Comment:
   > The user explicitly set local state recovery in the scenario where 
`StateLocalitySlotAssigner` is used (see 
[execution.state-recovery.from-local](https://github.com/apache/flink/blob/7adeecd3445947f42d3e3d1e2961b9464e910236/flink-core/src/main/java/org/apache/flink/configuration/StateRecoveryOptions.java#L108)),
 i.e. the user might value keeping the state on the machine in that case. 🤔
   
   Yes, you are right. Another scenario is that users want to obtain the 
benefits of local recovery (quickly recover the state) without wasting 
resources.
   
   If both of cases are needed, it seems we need an additional options to 
control it. (When using StateLocalitySlotAssigner, is local recovery high 
priority? Or is resources high priority?)
   
   Anyway, after our discussion, it's better to only update the 
DefaultAssigneer in this PR. The strategy of `StateLocalitySlotAssigner` can be 
discussed in a separate JIRA or mail list.
   
   WDYT?
   
   > > The state locality only take effect during the job recovery, it's an 
optimization.
   > 
   > Why would that only have an affect during job recovery (i.e. when the 
Dispatcher recovers the job)? Every rescale operation recovers from a 
checkpoint in the end. Or am I misunderstanding you here?
   
   Sorry, I didn't express clearly. I mean `job recovery` happens when job 
start or rescale.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-33977][runtime] Adaptive scheduler may not minimize the number of TMs during downscaling [flink]

Reply via email to