KR-bluejay commented on code in PR #1314:
URL: 
https://github.com/apache/datafusion-ballista/pull/1314#discussion_r2339937761


##########
ballista/executor/src/execution_loop.rs:
##########
@@ -88,8 +90,29 @@ pub async fn poll_loop<T: 'static + AsLogicalPlan, U: 
'static + AsExecutionPlan>
 
         match poll_work_result {
             Ok(result) => {
-                let tasks = result.into_inner().tasks;
+                let PollWorkResult {
+                    tasks,
+                    jobs_to_clean,
+                } = result.into_inner();
                 active_job = !tasks.is_empty();
+                let work_dir = PathBuf::from(&executor.work_dir);
+
+                // Clean up any state related to the listed jobs

Review Comment:
   (Just to clarify, these are follow-up thoughts, not blocking this PR.)
   
   By the way, do you already have thoughts on more fundamental changes in this 
area?
   From my side, I see a couple of short-term directions:
   
   1. There is some duplication with `clean_all_shuffle_data`, so as a simple 
step it might make sense to refactor the current remove-data logic.  
   2. In push-based cleanup, instead of broadcasting to all executors, it could 
be more efficient to notify only those that actually hold the job data.  
   3. Right now, when the scheduler calls clean_up_successful_job / 
clean_up_failed_job, each job is handled by spawning a separate sleep and then 
calling state.remove_job(job_id) individually
   
   I’d like to explore these as follow-up work, but I don’t yet have a clear 
picture of the best design.  
   Do you have any guidance or preferred direction here?
   
   If this level of discussion feels too broad for the PR, I can open a 
separate issue to track these ideas (since the existing issue is about 
something else).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to