andygrove opened a new pull request, #1902:
URL: https://github.com/apache/datafusion-ballista/pull/1902

   # Which issue does this PR close?
   
   Closes #1901.
   
   # Rationale for this change
   
   Client-set `datafusion.*` session config (e.g. `SET
   datafusion.optimizer.prefer_hash_join = true`, or `-c` overrides in the tpch
   benchmark) had no effect on scheduler-side planning, while `ballista.*` 
settings
   in the same session worked.
   
   The cause is `SessionConfigExt::upgrade_for_ballista`, which calls
   `ballista_restricted_configuration()` to apply Ballista's opinionated 
DataFusion
   defaults (`prefer_hash_join = false`, `hash_join_single_partition_threshold 
= 0`,
   the Utf8View flags). `SessionConfig::new_with_ballista()` already applies 
these
   once at construction. When `remote_with_state` later calls
   `upgrade_for_ballista` again, the defaults are re-applied *after* the user 
has
   set their own values, silently reverting them. `ballista.*` settings survive 
only
   because the restricted config does not touch them.
   
   Notably, the restricted-config comments state that users can opt back into
   `prefer_hash_join` and override the view-type flags via `SET` — but the
   re-application defeated exactly that.
   
   # What changes are included in this PR?
   
   - `upgrade_for_ballista` now applies `ballista_restricted_configuration()` 
only
     when the config has not already been through Ballista setup (detected by 
the
     absence of the `BallistaConfig` extension). A config that already carries 
the
     extension keeps the user's values.
   - Tests: a user override of `prefer_hash_join` / 
`hash_join_single_partition_threshold`
     survives `upgrade_for_ballista`; a plain config still receives Ballista's
     defaults on upgrade.
   
   # Are there any user-facing changes?
   
   Yes. `datafusion.*` session settings set on the client (via `SET` or config
   overrides) are now honored by the scheduler. `round_robin_repartition` 
remains
   effectively enforced because the scheduler forces it off when building the
   execution context. No SQL semantics change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to