Hi All, First time contributing, so reaching out by email before creating a JIRA ticket and PR. I would like to propose a small change/enhancement to OptimizeSkewedJoin.
Currently, OptimizeSkewedJoin has a hardcoded limit for multi table joins (limit = 2). For processes that have multiple joins (n > 2) OptimizeSkewedJoin will only be considered for two of the n joins. Code comment suggests it is currently defaulted to 2 due to too many complex combinations to consider etc, however, it would be good to allow users to override/configure this via the Spark Config, as complexity can be use case dependent. Proposal: Add spark.sql.adaptive.skewJoin.maxMultiTableJoin (default = 2) to SQLConf Update OptimizeSkewedJoin to consider above configuration If user sets > 2 log a warning to indicate complexity If people think this is a good idea and useful please let me know and I will proceed. Kind Regards, Alfie