danielhumanmod commented on issue #1344: URL: https://github.com/apache/datafusion-ballista/issues/1344#issuecomment-3640412709
> An alternative would be to add ability to register job completion callback which triggers change of job B once the job A finishes. Trigger can change UnresolvedShuffle to actual Exec (we could use same logic like we use for simple EXPLAIN). I believe job finish callback would be useful for other cases, I'm not sure at this point how complicated would it be to implement, may have a look this weekend. (https://github.com/apache/datafusion-ballista/pull/1309#issuecomment-3414552929) Hey @milenkovicm I did some investigation on this approach. The current processing logic for `QueryStageSchedulerEvent::TaskUpdating` even is already behaves like a built-in callback chain: ``` TaskUpdating event → update_task_statuses → update_task_status → update_stage_output_links → processing_stages_update → resolve_stage → remove_unresolved_shuffles ``` This whole chain is tightly coupled, extracting it into a standalone job-completion callback looks non-trivial But I find `remove_unresolved_shuffles()` is actually the point where `UnresolvedShuffleExec` -> `ShuffleReaderExec`, so another reasonable approach might be adding a lightweight abstraction for controlling how shuffles get resolved. Something like a `ShuffleResolver` that lets us plug in different UnresolvedShuffleExec replace strategies: ``` trait ShuffleResolver { fn should_resolve(...): bool; fn resolve(...): Result<Arc<dyn ExecutionPlan>>; } ``` The flow will looks like: ``` remove_unresolved_shuffles() ↓ resolver_manager.resolve() ↓ select_resolver() ├─ AnalyzeShuffleResolver.should_resolve()? │ ├─ true → Use this resolver and stop │ └─ false → continue │ ├─ DefaultShuffleResolver.should_resolve()? │ └─ true → use this and stop │ ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
