[GitHub] [arrow-ballista] thinkharderdev commented on issue #332: Introduce TaskGroups to take advantage of shared memory between task slots

via GitHub Sat, 15 Apr 2023 04:39:08 -0700


thinkharderdev commented on issue #332:
URL: https://github.com/apache/arrow-ballista/issues/332#issuecomment-1509738861


   Started working on this and it's in some ways more straightforward than I 
had thought (you can see a working, but hack since it avoids refactoring as 
much as possible on our fork at 
https://github.com/coralogix/arrow-ballista/pull/49). In that PR we sort of 
fake it by just sending a task status update for all the partitions in the task 
group where we send an empty set of shuffle locations for all but one task in 
the group. 
   
   But to do this properly we probably need to do a bit of refactoring. The big 
thing would be to decouple the concept of a `Task` from a single plan 
partition. So `TaskDescription` would change to something like
   
   ```
   pub struct TaskDescription {
       pub session_id: String,
       pub partition: Vec<PartitionId>,
       pub stage_attempt_num: usize,
       pub task_id: usize,
       pub plan: Arc<dyn ExecutionPlan>,
       pub output_partitioning: Option<Partitioning>,
   }
   ```
   
   And then we would use the `task_id` to connect everything together and rely 
on the `(job_id,task_id)` tuple as a unique identifier when referencing a 
running task (eg when cancelling running tasks) which should be adequate as 
`task_id` is guaranteed to be unique within a single job. 
   
   The only thing that I believe will not fit from the current implementation 
is tracking task-level failures. In this model a `Task` would be a transient 
entity as which partitions are included in any given task will be determined by 
how many executor slots are available at the moment it is scheduled and there 
would be no guarantee that if a task fails that the same set of partitions will 
be grouped together in the next task scheduled. We could fake it by 
incrementing the task failure count for a partition anytime a task that 
included that partition fails, but I'm not sure that adds any incremental value 
over just considering failures/retries at the stage level. So I would suggest 
that we only track failures at the stage level (as we already do) and no longer 
try and track individual partition attempts. 
   
   @yahoNanJing @Dandandan @andygrove I plan on working on this next week so 
let me know if this approach seems unreasonable :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-ballista] thinkharderdev commented on issue #332: Introduce TaskGroups to take advantage of shared memory between task slots

Reply via email to