thinkharderdev commented on code in PR #184:
URL: https://github.com/apache/arrow-ballista/pull/184#discussion_r962345779
##########
ballista/rust/scheduler/src/state/execution_graph.rs:
##########
@@ -581,6 +726,54 @@ impl ExecutionGraph {
}
}
+ /// Convert running stage to be unresolved
+ fn rollback_running_stage(&mut self, stage_id: usize) -> Result<bool> {
Review Comment:
I'm a little confused by this. If we have tasks that are already running
then they will either fail (if they cannot fetch their input partitions from
the lost executor) or have already fetched the partitions from the lost
executor in which case we shouldn't roll back. I guess I'm not clear on how
this process interacts with the incoming task updates. Shouldn't the rollback
be in response to task failure instead of just the executor missing a heartbeat?
##########
ballista/rust/scheduler/src/planner.rs:
##########
@@ -246,6 +246,31 @@ pub fn remove_unresolved_shuffles(
Ok(with_new_children_if_necessary(stage, new_children)?)
}
+pub fn rollback_resolved_shuffles(
Review Comment:
A comment here might be helpful as to what this function is doing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]