yahoNanJing commented on code in PR #153: URL: https://github.com/apache/arrow-ballista/pull/153#discussion_r954431538
########## ballista/rust/scheduler/src/state/task_manager.rs: ########## @@ -55,44 +58,62 @@ pub struct TaskManager<T: 'static + AsLogicalPlan, U: 'static + AsExecutionPlan> clients: ExecutorClients, session_builder: SessionBuilder, codec: BallistaCodec<T, U>, + scheduler_id: String, + // Cache for active execution graphs curated by this scheduler + active_job_cache: ExecutionGraphCache, } impl<T: 'static + AsLogicalPlan, U: 'static + AsExecutionPlan> TaskManager<T, U> { pub fn new( state: Arc<dyn StateBackendClient>, session_builder: SessionBuilder, codec: BallistaCodec<T, U>, + scheduler_id: String, ) -> Self { Self { state, clients: Default::default(), session_builder, codec, + scheduler_id, + active_job_cache: Arc::new(RwLock::new(HashMap::new())), } } /// Generate an ExecutionGraph for the job and save it to the persistent state. + /// By default, this job will be curated by the scheduler which receives it. + /// Then we will also save it to the active execution graph pub async fn submit_job( &self, job_id: &str, session_id: &str, plan: Arc<dyn ExecutionPlan>, ) -> Result<()> { - let graph = ExecutionGraph::new(job_id, session_id, plan)?; + let mut graph = + ExecutionGraph::new(&self.scheduler_id, job_id, session_id, plan)?; self.state .put( Keyspace::ActiveJobs, job_id.to_owned(), - self.encode_execution_graph(graph)?, + self.encode_execution_graph(graph.clone())?, ) .await?; + graph.revive(); Review Comment: Running stages will not be persisted to the backend. Before cache the `execution_graph`, we need to revive it by converting all revolved stages to the running stages. And it's better to make the plan in the running stages be the encoded one to avoid encoding cost when creating hundreds of task definitions as mentioned in #142. Later I'll propose a PR for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org