thinkharderdev commented on a change in pull request #1983:
URL: https://github.com/apache/arrow-datafusion/pull/1983#discussion_r827714678
##########
File path: ballista/rust/scheduler/src/scheduler_server/event_loop.rs
##########
@@ -107,14 +125,19 @@ impl<T: 'static + AsLogicalPlan, U: 'static +
AsExecutionPlan>
}
})
.collect::<Vec<String>>(),
- executor_data.executor_id
+ executor_delta_data.executor_id
);
let mut client = {
let clients = self.executors_client.read().await;
- clients.get(&executor_data.executor_id).unwrap().clone()
+ clients
+ .get(&executor_delta_data.executor_id)
+ .unwrap()
+ .clone()
};
// Update the resources first
- self.state.save_executor_data(executor_data.clone());
+ self.state
+ .executor_manager
+ .update_executor_data(executor_delta_data);
// TODO check whether launching task is successful or not
client.launch_task(LaunchTaskParams { task: tasks }).await?;
Review comment:
It was just a thought, not sure if the performance cost would be
justifies. But the idea is that updating the executor state and launching the
tasks should in principle be an atomic transaction. If you update first and
then the launching fails, you resources will never be released but if you
launch first and then update state then you have a concurrency issue where
another thread may try and launch tasks after the task are launched and before
the state is updated so you end up overcommitted.
The second issue seems less of a problem because it seems pretty
straightforward to just have the executor reject the task if it has no
available task slots, so it may be enough to just reverse the order here
(launch tasks -> update state) and let the existing error handling mechanisms
deal with it. Since launching requires a network call, holding a write lock on
the entire state for that operation could be a major performance bottleneck, so
if we did need a lock it would probably have to be at the individual executor
level.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]