[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5754: Improving optimizer performance by eliminating unnecessary sort and distribution passes, add more SymmetricHashJoin improvements

via GitHub Mon, 03 Apr 2023 10:28:58 -0700


ozankabak commented on code in PR #5754:
URL: https://github.com/apache/arrow-datafusion/pull/5754#discussion_r1156250272



##########
datafusion/common/src/config.rs:
##########
@@ -280,6 +280,10 @@ config_namespace! {
         /// using the provided `target_partitions` level
         pub repartition_joins: bool, default = true
 
+        /// Should DataFusion allow symmetric hash joins for unbounded data 
sources even when
+        /// its inputs do not have any ordering or filtering
+        pub allow_symmetric_joins_without_pruning: bool, default = true

Review Comment:
   SHJ will always produce correct results, but it will use twice as much 
memory (assuming inputs are of the same size) for no gain except pipelining.
   
   Some more explanation about this option: It is not always possible to detect 
100% accurately whether pruning may occur or not -- the system may think 
pruning is not possible where it is actually possible. Therefore, one would 
enable this option if they have a-priori knowledge that data would indeed lend 
itself to pruning. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5754: Improving optimizer performance by eliminating unnecessary sort and distribution passes, add more SymmetricHashJoin improvements

Reply via email to