danielhumanmod commented on issue #1344:
URL: 
https://github.com/apache/datafusion-ballista/issues/1344#issuecomment-3640412709

   > An alternative would be to add ability to register job completion callback 
which triggers change of job B once the job A finishes. Trigger can change 
UnresolvedShuffle to actual Exec (we could use same logic like we use for 
simple EXPLAIN).
   I believe job finish callback would be useful for other cases, I'm not sure 
at this point how complicated would it be to implement, may have a look this 
weekend. 
(https://github.com/apache/datafusion-ballista/pull/1309#issuecomment-3414552929)
   
   Hey @milenkovicm I did some investigation on this approach. The current 
processing logic for `QueryStageSchedulerEvent::TaskUpdating` even is already 
behaves like a built-in callback chain:
   ```
   TaskUpdating event
     → update_task_statuses
     → update_task_status
        → update_stage_output_links
        → processing_stages_update
            → resolve_stage
                → remove_unresolved_shuffles
   ```
   
   This whole chain is tightly coupled, extracting it into a standalone 
job-completion callback looks non-trivial
   
   But I find `remove_unresolved_shuffles()` is actually the point where 
`UnresolvedShuffleExec` -> `ShuffleReaderExec`, so another reasonable approach 
might be adding a lightweight abstraction for controlling how shuffles get 
resolved. Something like a `ShuffleResolver` that lets us plug in different 
UnresolvedShuffleExec replace strategies:
   ```
   trait ShuffleResolver {
   fn should_resolve(...): bool;
   fn resolve(...): Result<Arc<dyn ExecutionPlan>>;
   }
   ```
   
   The flow will looks like:
   ```
   remove_unresolved_shuffles()
     ↓
   resolver_manager.resolve()
     ↓
   select_resolver()  
     ├─ AnalyzeShuffleResolver.should_resolve()?
     │  ├─ true → Use this resolver and stop
     │  └─ false → continue
     │
     ├─ DefaultShuffleResolver.should_resolve()?
     │  └─ true → use this and stop
     │
     ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to