comphead commented on code in PR #18209: URL: https://github.com/apache/datafusion/pull/18209#discussion_r2452679557
########## docs/source/user-guide/configs.md: ########## @@ -253,3 +253,57 @@ SET datafusion.execution.batch_size = 1024; ``` [`fairspillpool`]: https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.FairSpillPool.html + +## Join Queries + +Currently Apache Datafusion supports the following join algorithms: + +- Nested Loop Join +- Sort Merge Join +- Hash Join +- Symmetric Hash Join +- Piecewise Merge Join (experimental) + +The physical planner will choose the appropriate algorithm based on the statistics + join +condition of the two tables. + +# Join Algorithm Optimizer Configurations + +You can modify join optimization behavior in your queries by setting specific configuration values. +Use the following command to update a configuration: + +``` +set datafusion.optimizer.<configuration_name> +``` + +Adjusting the following configuration values influences how the optimizer selects the join algorithm +used to execute your SQL query: + +## Join Optimizer Configurations + +Adjusting the following configuration values influences how the optimizer selects the join algorithm +used to execute your SQL query. + +### allow_symmetric_joins_without_pruning (bool, default = true) + +Controls whether symmetric hash joins are allowed for unbounded data sources even when their inputs +lack ordering or filtering. + +- If disabled, the `SymmetricHashJoin` operator cannot prune its internal buffers to be produced only at the end of execution. + +### prefer_hash_join (bool, default = true) + +Determines whether the optimizer prefers Hash Join over Sort Merge Join during physical plan selection. + +- true: favors HashJoin for faster execution when sufficient memory is available. +- false: allows SortMergeJoin to be chosen when more memory-efficient execution is needed. + +### enable_piecewise_merge_join (bool, default = false) Review Comment: btw you may also want include it in tpch cli utility, so people can test TPC queries with this kind of join -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
