yahoNanJing commented on a change in pull request #1983: URL: https://github.com/apache/arrow-datafusion/pull/1983#discussion_r827662903
########## File path: ballista/rust/scheduler/src/scheduler_server/event_loop.rs ########## @@ -107,14 +125,19 @@ impl<T: 'static + AsLogicalPlan, U: 'static + AsExecutionPlan> } }) .collect::<Vec<String>>(), - executor_data.executor_id + executor_delta_data.executor_id ); let mut client = { let clients = self.executors_client.read().await; - clients.get(&executor_data.executor_id).unwrap().clone() + clients + .get(&executor_delta_data.executor_id) + .unwrap() + .clone() }; // Update the resources first - self.state.save_executor_data(executor_data.clone()); + self.state + .executor_manager + .update_executor_data(executor_delta_data); // TODO check whether launching task is successful or not client.launch_task(LaunchTaskParams { task: tasks }).await?; Review comment: Hi @thinkharderdev, here, I didn't consider too much about the error handling previously. I think you suggestion of doing it reverse makes sense. Before updating the executor data, the tasks status have already been updated into Running status. Whether task launch successful or not, these tasks will be dangling in Running status. It's better for us to leverage speculative task execution in the future. And we don't need to consider too much of the situation of partial tasks launching success. Why do you think we need a write lock? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org