andygrove opened a new pull request, #1906:
URL: https://github.com/apache/datafusion-ballista/pull/1906
# Which issue does this PR close?
Closes #1776. Supersedes the earlier draft #1777 (which pinned a `branch-54`
git rev); this is a fresh upgrade off the latest `main` against the published
`datafusion = "54"`.
# Rationale for this change
DataFusion 54.0.0 is now released on crates.io, so Ballista can track it
with a normal version dependency instead of a git pin. Re-doing the upgrade
fresh on current `main` (rather than rebasing the stale draft) keeps the API
migration consistent with all the code that has landed since, and validates
against the released crate (whose API differs from the pre-release `branch-54`
commit the draft targeted — e.g. `parse_protobuf_partitioning` now takes a
`PhysicalPlanDecodeContext` rather than the extra argument the draft removed).
# What changes are included in this PR?
- Workspace deps: `datafusion`/`datafusion-*` `54`, `arrow`/`arrow-flight`
`58.3`, `object_store` `0.13.2`, and `rustyline` `18` in `ballista-cli`.
- API migration to DataFusion 54:
- `as_any` was removed from `ExecutionPlan`/`DataSource`/`PhysicalExpr`;
downcast directly on the trait object, and upcast `dyn ShuffleWriter` to `dyn
ExecutionPlan` before downcasting. `as_any` is retained where it still exists
(Arrow arrays, `UserDefinedLogicalNode`, `ExtensionOptions`).
- `ExecutionPlan::partition_statistics` now returns
`Result<Arc<Statistics>>`.
- `parse_protobuf_partitioning` / `parse_protobuf_hash_partitioning` take
a `PhysicalPlanDecodeContext`; `serialize_partitioning` takes the codec plus
the proto converter.
- `TaskContext::new` and `FunctionRegistry` gained higher-order-function
parameters/methods, wired with empty defaults in Ballista.
- `BatchPartitioner::new_hash_partitioner` is now fallible.
- Test updates for DataFusion 54 behavior changes:
- `approx_percentile_cont_with_weight` result value.
- Shuffle-writer tests made robust to hash distribution (54 changed the
hash split) — assert row conservation and valid file paths rather than an exact
per-partition split.
- Execution-graph / dot tests build their physical plan with the Ballista
session config (or `hash_join_single_partition_threshold = 0`) so the small
test join keeps its `Partitioned` plan, matching production.
- Regenerated AQE plan snapshots (54 folds a `ProjectionExec` into
`HashJoinExec` as a `projection=` field).
# Are there any user-facing changes?
Ballista now runs on DataFusion 54. No Ballista API or SQL semantics change.
Verified: `cargo build --workspace --all-targets`, `cargo clippy
--all-targets --workspace -- -D warnings`, and `cargo test --workspace` (770
passing) all clean.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]