Hi all, I'd like to pick up FLINK-38698, a critical bug where offloaded TaskInformation BLOBs accumulate without cleanup and can exhaust the BLOB store on long-running jobs.
Root cause: The adaptive scheduler rebuilds the ExecutionGraph on every restart or rescale, and each rebuild re-offloads deployment metadata to the BLOB store under fresh keys. The superseded graph's BLOBs are only removed at global job termination, so on a long-running job that restarts or rescales repeatedly they accumulate and are never reclaimed for the lifetime of the job. The default scheduler is not affected, since it reuses the same ExecutionGraph. Proposed fix: Release a graph's offloaded deployment BLOBs at the points where the adaptive scheduler discards that graph, reusing the existing BLOB-deletion path on DefaultExecutionGraph. The release is scoped to graphs that are actually being discarded, so it never touches BLOBs a live graph still needs. No state, wire-format, or public API changes. I have implemented this locally and validated it with tests covering the discard cases. I left the same proposal on the JIRA on June 2. Could a committer please review the approach, and if it looks reasonable, assign the ticket to me so I can open a PR? Thanks, Spoorthi Basu
