timsaucer opened a new pull request, #1532:
URL: https://github.com/apache/datafusion-python/pull/1532

   # Which issue does this PR close?
   
   We do not have an issue for this.
   
   # Rationale for this change
   
   We are updating the upstream DataFusion dependency so that we can reduce the 
time to release 54 once the new version is released.
   
   # What changes are included in this PR?
   
   **Dependency bump:**
   
   - `[workspace.package].version`: `53.0.0` → `54.0.0`.
   - All `datafusion*` workspace deps switched from `version = "53"` to `git = 
"https://github.com/apache/datafusion";, rev = 
"3d06bedcc9afbd65781ac1de28741c36140d2cbb"`.
   - `Cargo.lock` refreshed for the datafusion family only.
   
   **Rust compile fixes (28 errors):**
   
   - Drop `as_any` impls — upstream traits (`AggregateUDFImpl`, 
`ScalarUDFImpl`, `WindowUDFImpl`, `SchemaProvider`, `CatalogProvider`, 
`CatalogProviderList`, `TableProvider`, `TableSource`, `ExecutionPlan`) now 
have `Any` as a supertrait. Call sites switch from 
`arc.as_any().downcast_ref::<T>()` to the upstream-provided 
`arc.downcast_ref::<T>()` helper.
   - FFI provider conversions (`Arc<dyn X + Send>` → `Arc<dyn X>`): upstream 
`From<&FFI_*>` no longer carries the redundant `+ Send` bound now that the 
traits require Send/Sync as supertraits.
   - `Cast` / `TryCast`: `data_type: DataType` → `field: FieldRef`. Python 
`PyCast.data_type()` accessor preserved.
   - Stub match arms for new `Expr::HigherOrderFunction` / `Lambda` / 
`LambdaVariable` variants returning `Unsupported`. Upstream HOFs are not 
shipped yet 
([apache/datafusion#14205](https://github.com/apache/datafusion/issues/14205)).
   - Stub match arms for new `ScalarValue::ListView` / `LargeListView`. No 
53.1.0 scalar functions produce these directly.
   - `DatasetExec::partition_statistics` returns `Arc<Statistics>`; add new 
required `apply_expressions` trait method (leaf returns `Continue`).
   - `#[allow(deprecated)]` on `TableFunctionImpl::call` pending a 
`call_with_args` migration that needs `SessionState` plumbing.
   
   **Python test fixes (23 expectations) for upstream behavior changes:**
   
   - `median` / `approx_median` / `approx_percentile_cont` return `Float64` 
(was matching input type).
   - String functions (`concat_ws`, `lower`, `upper`, `repeat`, `reverse`, 
`split_part`, `translate`) return `StringView` for `StringView` input (was 
`String`).
   - `overlay` appends past end-of-string rather than replacing.
   - `arrays_zip` / `list_zip` struct field names changed from `c0`/`c1` to 
`"1"`/`"2"`.
   - Filter on mismatched cast types now errors (was 0 matches).
   
   **`check-upstream` audit trivial wins:**
   
   - New `DataFrame.alias(name)` — wraps the logical plan in a `SubqueryAlias` 
for self-joins and qualifier-style references.
   - `functions.__all__`: add `instr` and `position` (both already defined as 
public defs but missing from `__all__`).
   - Top-level `datafusion.__all__`: re-export `TableProviderFactory` and 
`TableProviderFactoryExportable` (previously reachable only via the 
`datafusion.catalog` submodule).
   
   # Are there any user-facing changes?
   
   Yes — several behavior changes inherited from upstream DataFusion 54 
(warrants `api change` label):
   
   - `median` / `approx_median` / `approx_percentile_cont` now return `Float64` 
rather than matching the input type.
   - String functions return `StringView` when fed `StringView` input 
(`concat_ws`, `lower`, `upper`, `repeat`, `reverse`, `split_part`, `translate`).
   - `overlay` semantics: passing a start position past the end of a string now 
appends the replacement, e.g. `overlay("!", "--", 2) → "!--"` (was `"--"`).
   - `arrays_zip` / `list_zip` field names changed: `c0`/`c1` → `"1"`/`"2"`.
   - Comparing a numeric column against an incompatible string literal in a 
filter now raises a `Cannot cast string` error, where previously it silently 
produced zero matches.
   - New: `DataFrame.alias(name)`, `instr` and `position` now appear under 
`from datafusion.functions import *`, `TableProviderFactory` and 
`TableProviderFactoryExportable` are now reachable from the top-level 
`datafusion` namespace.
   
   ## Follow-ups (not in this PR)
   
   The `check-upstream` audit surfaced additional non-trivial gaps that each 
warrant their own design and PR:
   
   - `DataFrame.registry` / `into_optimized_plan` / `into_unoptimized_plan` / 
`into_parts` / `task_ctx` — each needs a new wrapper class (e.g. 
`FunctionRegistry`, `SessionState`, `TaskContext`).
   - `SessionContext` extensibility surface — I/O helpers 
(`read_batch`/`read_batches`), planner/rule extension (`add_optimizer_rule`, 
`add_analyzer_rule`, `register_expr_planner`, `register_relation_planner`, 
`with_function_factory`), state access (`state`, `runtime_env`, `task_ctx`, 
`new_with_state`), UDF introspection (`udf`/`udaf`/`udwf` lookup + listing), 
and misc helpers (`create_physical_expr`, `table_function`, `table_factory`, 
`parse_capacity_limit`). Tracked under EPIC 
[#24](https://github.com/apache/datafusion-python/issues/24).
   - Distinct-aware aggregates: `count_distinct`, `sum_distinct`, 
`avg_distinct`. Upstream design at 
[apache/arrow-datafusion#2407](https://github.com/apache/arrow-datafusion/issues/2407).
   - `TableFunctionImpl::call_with_args` migration — needs `SessionState` 
plumbing through `PyTableFunction`. Will be a user-facing API change.
   - FFI Protocol pipeline completions for `FFI_TableFunction` 
(`from_pycapsule`, `TableFunctionExportable`, ABC), 
`FFI_LogicalExtensionCodec`, `FFI_ExtensionOptions`.
   - Scalar `get_field_path` (variant of `get_field` taking a path expression).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to