alamb commented on PR #6929:
URL: 
https://github.com/apache/arrow-datafusion/pull/6929#issuecomment-1636443457

   >  If it turns out that bounding memory usage inevitably reduces performance 
in a non-negligible way, I propose we introduce a configuration flag to control 
this. We can use the high-performance/unbounded behavior the default one, but 
one should still be able to choose the lower performance/bounded version for 
memory conscious use cases.
   
   I don't think we should ever be using unbounded memory ever if we can avoid 
it -- in this case if the producer goes faster than the consumer it will just 
buffer a huge amount of data (and eg will eventually OOM with TPCH SF100, or 
SF1000)
   
   I like @Dandandan 's suggestion to introduce more buffering
   
   Perhaps we could extend the existing DistributionSender to have a queue (2 
or 3 for example) rather than just a single `Option<>` so that it was possible 
to start fetching the next input immediately
   
   
https://github.com/apache/arrow-datafusion/blob/d316702722e6c301fdb23a9698f7ec415ef548e9/datafusion/core/src/physical_plan/repartition/distributor_channels.rs#L180-L182


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to