[GitHub] [arrow-ballista] thinkharderdev commented on a diff in pull request #184: Executor lost handling

GitBox Sun, 04 Sep 2022 10:42:25 -0700


thinkharderdev commented on code in PR #184:
URL: https://github.com/apache/arrow-ballista/pull/184#discussion_r962345779



##########
ballista/rust/scheduler/src/state/execution_graph.rs:
##########
@@ -581,6 +726,54 @@ impl ExecutionGraph {
         }
     }
 
+    /// Convert running stage to be unresolved
+    fn rollback_running_stage(&mut self, stage_id: usize) -> Result<bool> {

Review Comment:
   I'm a little confused by this. If we have tasks that are already running 
then they will either fail (if they cannot fetch their input partitions from 
the lost executor) or have already fetched the partitions from the lost 
executor in which case we shouldn't roll back. I guess I'm not clear on how 
this process interacts with the incoming task updates. Shouldn't the rollback 
be in response to task failure instead of just the executor missing a heartbeat?



##########
ballista/rust/scheduler/src/planner.rs:
##########
@@ -246,6 +246,31 @@ pub fn remove_unresolved_shuffles(
     Ok(with_new_children_if_necessary(stage, new_children)?)
 }
 
+pub fn rollback_resolved_shuffles(

Review Comment:
   A comment here might be helpful as to what this function is doing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-ballista] thinkharderdev commented on a diff in pull request #184: Executor lost handling

Reply via email to