westonpace opened a new pull request, #14524:
URL: https://github.com/apache/arrow/pull/14524
…ire a call to End
This call to end was a foot-gun because failing to call End on a scheduler
would lead to deadlock. This led to numerous situations where one would
accidentally fail to call End because of some kind of error or exceptional
behavior.
Now the call to end is no longer required. Schedulers are semantically
divided (there aren't actually three different classes, just three different
supported use cases) into three different types:
* Destroy-when-done schedulers destroy themselves whenever they have
finished running all of their tasks. These schedulers require an initial task
to bootstrap the scheduler and then that task (and it's subtasks) can add more
sub-tasks. This is the simplest and most basic kind of scheduler and the
top-level scheduler must always be this kind of scheduler.
* Peer schedulers will be ended when their parent scheduler runs out of
tasks. These are useful for sinks which get tasks on a push-driven basis and
don't have all their tasks up front. However, when the plan is done, we know
for certain no more tasks will arrive.
* Eager peer schedulers are the same as peer schedulers but they might be
eagerly ended before their parent scheduler is finished. The file queue in the
dataset writer is a good example. It receives tasks in an ad-hoc fashion so it
is a peer scheduler. However, once we have written max_rows rows to a file we
can end it early (no need to wait for the plan to finish). However, if we
forget to end it early (or fail to due so in some error scenario) it will still
end when the plan ends.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]