ozankabak commented on code in PR #5754:
URL: https://github.com/apache/arrow-datafusion/pull/5754#discussion_r1156250272
##########
datafusion/common/src/config.rs:
##########
@@ -280,6 +280,10 @@ config_namespace! {
/// using the provided `target_partitions` level
pub repartition_joins: bool, default = true
+ /// Should DataFusion allow symmetric hash joins for unbounded data
sources even when
+ /// its inputs do not have any ordering or filtering
+ pub allow_symmetric_joins_without_pruning: bool, default = true
Review Comment:
SHJ will always produce correct results, but it will use twice as much
memory (assuming inputs are of the same size) for no gain except pipelining.
Some more explanation about this option: It is not always possible to detect
100% accurately whether pruning may occur or not -- the system may think
pruning is not possible where it is actually possible. Therefore, one would
enable this option if they have a-priori knowledge that data would indeed lend
itself to pruning.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]