crepererum commented on PR #6929: URL: https://github.com/apache/arrow-datafusion/pull/6929#issuecomment-1637677431
@YjyJeff Please read the reasoning here before experimenting with unbounded constructs: https://github.com/apache/arrow-datafusion/blob/49583bd5010282ca126e75100dce958aa346e5ee/datafusion/core/src/physical_plan/repartition/distributor_channels.rs#L18-L39 Mainly, the reason that we are NOT using unbounded channels is that this just buffers large (potentially unbounded) quantities of data in many scenarios. Mostly this happens when the repartition producer side (e.g. trivial data reading like highly compressed parquet files) is way faster than the consumer side (e.g. complicated transforms or costly aggregations). Note that I'm NOT saying that the current impl. is good (or even perfect), there are likley better options. But just slapping an unbounded channel on this problem is not going to solve it. Sure that wins in some micro-benchmarking but it fails to provide a robust foundation for query execution.[^1] I agree w/ @ozankabak & Co though that SOME buffering is OK. So I think the following config option would be robust, reasonably fast and unsurprising to the user: A option "repartition buffer bytes OR messages" (or similar name) that limits bytes or messages per channel (not per repartition, otherwise the cross-comm overhead is too high) and only if this limit is met we fall back to the cross-channel gating behavior described in the code comment linked above (namely: we let data flow freely if there's at least 1 empty channel). Ref #4867 and #4865. [^1]: If someone finds a good metric, benchmark, or test for robustness, please open a ticket / PR. I am quite unhappy that this is currently mostly based on engineering intuition. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
