Hi All,

First time contributing, so reaching out by email before creating a JIRA ticket 
and PR. I would like to propose a small change/enhancement to 
OptimizeSkewedJoin.

Currently, OptimizeSkewedJoin has a hardcoded limit for multi table joins 
(limit = 2). For processes that have multiple joins (n > 2) OptimizeSkewedJoin 
will only be considered for two of the n joins.

Code comment suggests it is currently defaulted to 2 due to too many complex 
combinations to consider etc, however, it would be good to allow users to 
override/configure this via the Spark Config, as complexity can be use case 
dependent.

Proposal:
Add spark.sql.adaptive.skewJoin.maxMultiTableJoin (default = 2) to SQLConf
Update OptimizeSkewedJoin to consider above configuration
If user sets > 2 log a warning to indicate complexity

If people think this is a good idea and useful please let me know and I will 
proceed.

Kind Regards,

Alfie


Reply via email to