[GitHub] [arrow-datafusion] crepererum opened a new issue, #4865: Improve repartition buffering

GitBox Tue, 10 Jan 2023 01:18:28 -0800


crepererum opened a new issue, #4865:
URL: https://github.com/apache/arrow-datafusion/issues/4865


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   In #4820 @alamb and I discussed that the repartition node could have a 
slightly smarter buffering. This is a tracking issue for this.
   
   **Describe the solution you'd like**
   While the repartition node needs an unbounded buffer to prevent dead locks, 
it doesn't need to buffer unlimited amount of data in all cases. To be precise: 
if ALL output channels have data (i.e. are not empty), than the input workers 
can be paused. However if it least one output channel is empty, we need to 
drive the input workers. In the worst case, a few channels will fill up with 
unbounded data but one channel will forever stay empty. Realistically, this 
will not happen for any reasonable repartition configuration.
   
   **Describe alternatives you've considered**
   Keeping the current state.
   
   **Additional context**
   \-
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] crepererum opened a new issue, #4865: Improve repartition buffering

Reply via email to