KR-bluejay commented on code in PR #1314: URL: https://github.com/apache/datafusion-ballista/pull/1314#discussion_r2339937761
########## ballista/executor/src/execution_loop.rs: ########## @@ -88,8 +90,29 @@ pub async fn poll_loop<T: 'static + AsLogicalPlan, U: 'static + AsExecutionPlan> match poll_work_result { Ok(result) => { - let tasks = result.into_inner().tasks; + let PollWorkResult { + tasks, + jobs_to_clean, + } = result.into_inner(); active_job = !tasks.is_empty(); + let work_dir = PathBuf::from(&executor.work_dir); + + // Clean up any state related to the listed jobs Review Comment: (Just to clarify, these are follow-up thoughts, not blocking this PR.) By the way, do you already have thoughts on more fundamental changes in this area? From my side, I see a couple of short-term directions: 1. There is some duplication with `clean_all_shuffle_data`, so as a simple step it might make sense to refactor the current remove-data logic. 2. In push-based cleanup, instead of broadcasting to all executors, it could be more efficient to notify only those that actually hold the job data. 3. Right now, when the scheduler calls clean_up_successful_job / clean_up_failed_job, each job is handled by spawning a separate sleep and then calling state.remove_job(job_id) individually I’d like to explore these as follow-up work, but I don’t yet have a clear picture of the best design. Do you have any guidance or preferred direction here? If this level of discussion feels too broad for the PR, I can open a separate issue to track these ideas (since the existing issue is about something else). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org