KR-bluejay opened a new issue, #1316: URL: https://github.com/apache/datafusion-ballista/issues/1316
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** There are several improvement points in the current job-data deletion flow: 1. *Duplication* The executor has `clean_all_shuffle_data` alongside other ad-hoc removal logic. These overlap in functionality, making the code harder to maintain and reason about. 2. *Push-based broadcast* When the scheduler initiates cleanup, it currently notifies all executors. This is inefficient because only a subset of executors actually hold the job’s data. 3. *Per-job deletion tasks* In `clean_up_successful_job` / `clean_up_failed_job`, the scheduler spawns a separate delayed task (`sleep`) for each job and calls `state.remove_job(job_id)` individually. This results in many small tasks and RPCs, which could be batched more efficiently. **Describe the solution you'd like** Unify cleanup behind a single, testable “deletion facility”: 1. *Deduplicate* logic with `clean_all_shuffle_data`; extract/keep a shared async remover (e.g., `remove_job_dir`) with safety checks. 2. *Targeted push*: notify only executors that actually hold the job’s data (no broadcast). 3. *Batching*: we already dispatch periodically; change each tick to send one batched `remove_jobs(Vec<JobId>)` for all pending IDs rather than spawning per-job sleeps and individual removals. **Describe alternatives you've considered** **Additional context** Related: #1219 , #1314 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org