yahoNanJing commented on PR #823:
URL: https://github.com/apache/arrow-ballista/pull/823#issuecomment-1617126925

   Thanks @thinkharderdev for your comments. 
   
   > I'm still a little confused as to why this is required to enable caching. 
   
   For consistent hashing based task assignment, we should do the task 
assignment based on the scan files of the task if there is. The details is 
described in #833. This means it's necessary to assign a specific executor for 
a task rather than assign a random task for an executor.
   
   
   > The original goal of the ExecutorReservation was to minimize contention on 
the task slots state. 
   
   I totally understand the purpose of `ExecutorReservation`. However, for the 
current implementation, it actually does not reduce the contention too much. 
https://github.com/apache/arrow-ballista/blob/b65464e4b73590470fa69aad5b6954300ad243a0/ballista/scheduler/src/state/mod.rs#L190-L228
   
   From the above code, if there are still some pending tasks, it will still go 
to invoke `reserve_slots`.
   
   To reduce the resource contention or lock contention, based on this PR, I'll 
raise another PR to refactor the event processing to introduce batch event 
processing. For example, to combine 10 task status update event to one so that 
only one resource contention will be involved. Sample code can be found 
[here](https://github.com/yahoNanJing/arrow-ballista/blob/dev-20230510/ballista/scheduler/src/state/event_action.rs)
  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to