westonpace opened a new pull request, #14524:
URL: https://github.com/apache/arrow/pull/14524

   …ire a call to End
   
   This call to end was a foot-gun because failing to call End on a scheduler 
would lead to deadlock.  This led to numerous situations where one would 
accidentally fail to call End because of some kind of error or exceptional 
behavior.
   
   Now the call to end is no longer required.  Schedulers are semantically 
divided (there aren't actually three different classes, just three different 
supported use cases) into three different types:
   
    * Destroy-when-done schedulers destroy themselves whenever they have 
finished running all of their tasks. These schedulers require an initial task 
to bootstrap the scheduler and then that task (and it's subtasks) can add more 
sub-tasks.  This is the simplest and most basic kind of scheduler and the 
top-level scheduler must always be this kind of scheduler.
    * Peer schedulers will be ended when their parent scheduler runs out of 
tasks.  These are useful for sinks which get tasks on a push-driven basis and 
don't have all their tasks up front.  However, when the plan is done, we know 
for certain no more tasks will arrive.
    * Eager peer schedulers are the same as peer schedulers but they might be 
eagerly ended before their parent scheduler is finished.  The file queue in the 
dataset writer is a good example.  It receives tasks in an ad-hoc fashion so it 
is a peer scheduler.  However, once we have written max_rows rows to a file we 
can end it early (no need to wait for the plan to finish).  However, if we 
forget to end it early (or fail to due so in some error scenario) it will still 
end when the plan ends.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to