danielhumanmod commented on PR #1428:
URL: 
https://github.com/apache/datafusion-ballista/pull/1428#issuecomment-4130960540

   > thanks @danielhumanmod, first of all sorry for late review.
   > 
   > I'm a bit puzzled which approach would be the best approach for this.
   > 
   > I'm not sure if just splitting jobs by logical plan will be sufficient. 
One simple case, if we just have job split registry how can we "communicate" 
outcome of first job to second job?
   > 
   > for example if job-a generates set of exchange files how can we share 
location of them to dependant job (job-b) if logical plan has already been 
created?
   > 
   > Could we have some kind of callback mechanism which could be used to 
describe dependency information and share job information:
   > 
   > ```
   > let job_a = job_a.and_then(|job_info| {
   >  job_b_definition(job_info)
   > }
   > 
   > submit (job_a)
   > ```
   > 
   > on success of `job_a` should invoke `and_then` and generate definition for 
job_b. this way job_b logical plan could be something which cant be derived 
with composite logical plan of job_a + job_b as it's currently case with 
overarching logical plan
   > 
   > or we could have multi job dependencies
   > 
   > ```
   > let job_a = job_a.and_then(|job_info| {
   >  job_b_definition(job_info).and_then(|...| {...}
   > }
   > 
   > submit (job_a)
   > ```
   > 
   > open for suggestions. and thanks a lot for taking the time to drive this,
   
   @milenkovicm  Sorry for the late reply, recently busy with my fulltime job. 
After investigation, the job callback + client-side approach seems reasonable!
   
   <img width="1046" height="728" alt="image" 
src="https://github.com/user-attachments/assets/ac6483ee-91af-4818-8ad7-fcd55f3d972f";
 />
   
   Core ideas are:
   1. Client intercepts LogicalPlan::Analyze, submits only the inner query to 
scheduler
   2. After inner job completes, client fetches metrics via a new 
`GetJobMetrics` endpoint and formats the output
   3. Scheduler stays completely unaware of the callback logic


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to