kosiew opened a new pull request, #1243:
URL: https://github.com/apache/datafusion-python/pull/1243

   
   ## Which issue does this PR close?
   
   * Closes #1239.
   
   ---
   
   ## Rationale for this change
   
   This change unifies how table-like objects are represented and registered in 
DataFusion's Python bindings. Historically there were multiple ad-hoc ways to 
register tables (direct `Table` objects, FFI pycapsules exposed by Rust 
providers, `DataFrame` views, and the `register_table_provider` API). That 
fragmentation made the code harder to maintain, made FFI integration awkward, 
and caused subtle API surface inconsistencies.
   
   This patch introduces a single, high-level `TableProvider` wrapper in Python 
(backed by a `PyTableProvider` Rust type) and centralizes the logic that 
coerces various supported inputs into a concrete provider. It also:
   
   * Makes `SessionContext.register_table(...)` the single, preferred 
entrypoint for table registration.
   * Deprecates `SessionContext.register_table_provider(...)` in favor of 
`register_table` while preserving backward compatibility (it forwards to 
`register_table` and emits a `DeprecationWarning`).
   * Adds utilities to normalize/coerce supported inputs (native `Table`, the 
new `TableProvider` wrapper, PyCapsule-based foreign providers, and PyArrow 
datasets) into the expected Rust `TableProvider` implementation.
   
   Overall this reduces duplication, clarifies documentation and examples, and 
provides a clearer path for FFI authors to expose table providers to Python.
   
   ---
   
   ## What changes are included in this PR?
   
   **High-level summary**
   
   * New Python public API: `datafusion.TableProvider` wrapper 
(python/datafusion/table\_provider.py)
   * New Rust `PyTableProvider` type and module (src/table.rs) 
exposing/from-capsule/from-dataframe helpers and `__datafusion_table_provider__`
   * Centralized coercion helpers on the Rust side: `coerce_table_provider` and 
`table_provider_from_pycapsule` (src/utils.rs)
   * New Python helper utilities: `datafusion.utils._normalize_table_provider` 
(python/datafusion/utils.py)
   * Update `SessionContext.register_table(...)` to accept `Table | 
TableProvider | objects exporting __datafusion_table_provider__` (Python + Rust)
   * Deprecate `register_table_provider(...)` and `TableProvider.from_view()` 
(Python + Rust) with warnings, while preserving behavior by delegating to new 
API where appropriate.
   * Make `DataFrame.into_view()` return a `TableProvider` (Python) and return 
`PyTableProvider` from Rust `into_view`.
   * Export a helpful error message constant `EXPECTED_PROVIDER_MSG` to give 
clearer errors when users pass unsupported objects.
   * Update docs and user-guide examples to use `TableProvider` + 
`register_table`.
   * Add/modify tests to cover the new APIs and coercion rules.
   * Changelog entry documenting the deprecation of 
`SessionContext.register_table_provider`.
   
   **Files added**
   
   * `python/datafusion/table_provider.py` — high-level Python wrapper around 
the internal table provider.
   * `python/datafusion/utils.py` — helper `_normalize_table_provider` and 
pyarrow dataset handling.
   * `src/table.rs` — `PyTableProvider` Rust implementation.
   
   **Files modified (representative, not exhaustive)**
   
   * Python: `__init__.py`, `catalog.py`, `context.py`, `dataframe.py`, 
`io/table_provider.rst`, `data-sources.rst`, examples and tests under 
`examples/` and `python/tests/`.
   * Rust: `src/utils.rs`, `src/catalog.rs`, `src/context.rs`, 
`src/dataframe.rs`, `src/udtf.rs`, `src/lib.rs`, and other modules adjusted to 
use the new table provider helpers.
   
   **Behavioral changes**
   
   * `SessionContext.register_table(name, table)` now accepts:
   
     * `datafusion.catalog.Table` (existing behavior preserved),
     * `datafusion.TableProvider` (new wrapper),
     * Objects exporting `__datafusion_table_provider__()` (pycapsule-based FFI 
providers),
     * `pyarrow.dataset.Dataset` instances.
   
   * `SessionContext.register_table_provider(...)` is deprecated and will warn; 
it forwards to `register_table` for backwards compatibility.
   
   * `TableProvider.from_view()` is deprecated in favor of 
`DataFrame.into_view()` and `TableProvider.from_dataframe()`; calling the 
deprecated method emits a `DeprecationWarning`.
   
   * `DataFrame.into_view()` now returns a `TableProvider` wrapper rather than 
the older internal table representation exposed directly to Python.
   
   * A common, clearer error message (`EXPECTED_PROVIDER_MSG`) is provided and 
exported for tests and user-facing errors.
   
   ---
   
   ## Are these changes tested?
   
   Yes — the PR includes unit and integration test updates and additions in 
`python/tests/` to cover:
   
   * Registering a table from a `TableProvider` created via `from_capsule`, 
`from_dataframe`, and via `DataFrame.into_view()`.
   * Registering PyArrow `Dataset` objects via `Schema.register_table` and 
`SessionContext.register_table`.
   * Ensuring `DataFrame` objects raise a clear `TypeError` when passed 
directly to `register_table` (guiding users to `into_view()` / 
`from_dataframe()`).
   * Tests asserting proper `DeprecationWarning` behavior for `from_view` and 
`register_table_provider`.
   
   If any tests still need to be added, they should exercise cross-language FFI 
flows (Rust-provided pycapsule -> Python `TableProvider.from_capsule` -> 
`register_table`).
   
   ---
   
   ## Are there any user-facing changes?
   
   Yes.
   
   **API additions / changes**
   
   * New public API: `datafusion.TableProvider` (Python).
   * `DataFrame.into_view()` returns a `TableProvider` (Python).
   * `SessionContext.register_table(name, table)` accepts broader inputs and is 
the canonical registration API.
   * `SessionContext.register_table_provider` is deprecated (will emit 
`DeprecationWarning` and forward to `register_table`).
   * `TableProvider.from_view()` is deprecated in favor of 
`DataFrame.into_view()` and `TableProvider.from_dataframe()`.
   * A new exported constant `datafusion._internal.EXPECTED_PROVIDER_MSG` (and 
re-exported as `datafusion.EXPECTED_PROVIDER_MSG`) provides a stable error 
message for consumers and tests.
   
   **Documentation**
   
   * User guide snippets and examples updated to show the new `TableProvider` 
and `register_table` usage patterns.
   * A changelog deprecation entry has been added.
   
   **Compatibility**
   
   * Backwards compatibility is preserved where feasible: existing code that 
calls `register_table_provider()` will continue to work but will receive a 
deprecation warning.
   * Users passing `DataFrame` objects directly to `register_table` will now 
get a clear error directing them to `into_view()`/`from_dataframe()`.
   
   **Breaking changes**
   
   * This PR is designed to be minimally breaking. It intentionally deprecates 
rather than removes prior APIs and issues `DeprecationWarning`s. However, code 
that relied on internal implementation details of the old table provider 
representation (rather than the stable public APIs) may require updates.
   
   ---
   
   ### Notes for reviewers
   
   * Focus on the coercion logic (`coerce_table_provider` / 
`_normalize_table_provider`): does it accept the right set of inputs and 
provide clear errors? Are there additional types we should accept?
   * Verify deprecation warning messaging and stacklevels to ensure they point 
at user code rather than library internals.
   * Confirm the documentation examples and user-guide reflect the recommended 
patterns (using `TableProvider` + `register_table`).
   * Ensure the exported `EXPECTED_PROVIDER_MSG` wording is acceptable and 
stable for users and tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to