kosiew opened a new pull request, #1253:
URL: https://github.com/apache/datafusion-python/pull/1253

   ## Which issue does this PR close?
   
   * Closes #1250 
   * 
   ## Rationale for this change
   
   A number of PyO3 `#[pyclass]` wrappers used in the Python bindings are thin 
wrappers around shared Rust state but expose `&mut self` methods (or public 
mutable fields). When Python code shares those objects across threads or keeps 
multiple references, PyO3 hands out a `PyRefMut` and later attempts to 
re-borrow will raise runtime `Already borrowed` errors. This breaks intended 
usage patterns where these thin wrappers should behave like `SessionContext` 
(which is `#[pyclass(frozen)]` and safe to hand around without borrow errors).
   
   To make the wrappers safe for concurrent and re-entrant access from Python 
we:
   
   * Mark the pyclasses `frozen` to allow `&self`-style APIs that don't require 
PyRefMut.
   * Introduce interior mutability using `Arc` + `parking_lot::{RwLock, Mutex}` 
for fields that need mutation or caching.
   * Replace internal ownership patterns to avoid handing out `PyRefMut` to 
Python while preserving the original behaviour and semantics.
   * Update call sites to use shared ownership and lock guards where necessary, 
avoiding `&mut self` on public methods exposed to Python.
   
   This makes the wrappers safe to share between Python threads and avoids 
spurious borrow errors while preserving behaviour.
   
   ## What changes are included in this PR?
   
   Summary of concrete changes:
   
   * **New dependency**: add `parking_lot = "0.12"` to `Cargo.toml` and update 
`Cargo.lock`.
   
   * **New tests**:
   
     * `python/tests/test_concurrency.py` — exercises concurrent access 
patterns (SqlSchema, Config, DataFrame, CaseBuilder reuse) from multiple 
threads.
     * Modifications / additions in `python/tests/test_expr.py` to assert case 
builder state is preserved after success/failure and that `when`-handles are 
independent.
   
   * **Rust code changes (selected files)**:
   
     * `src/common/schema.rs`
   
       * `SqlSchema` now wraps fields in `Arc<RwLock<...>>` and exposes 
`#[getter]` / `#[setter]` methods that use the locks. The `#[pyclass]` is 
marked `frozen`.
     * `src/config.rs`
   
       * `PyConfig` now stores an `Arc<RwLock<ConfigOptions>>`, uses `&self` 
for getters/setters, and locks internally.
     * `src/dataframe.rs`
   
       * `PyDataFrame` now stores `batches` in an 
`Arc<parking_lot::Mutex<Option<...>>>`, switching methods to take `&self` and 
using the mutex for caching.
     * `src/expr/conditional_expr.rs`
   
       * `PyCaseBuilder` now uses 
`Arc<parking_lot::Mutex<Option<CaseBuilder>>>` and carefully takes/stores the 
inner builder so `when` / `otherwise` / `end` can be called concurrently or 
re-entrantly while preserving builder state on error.
     * `src/expr/literal.rs`
   
       * `PyLiteral` methods changed to take `&self` where appropriate.
     * `src/functions.rs`
   
       * `case` and `when` constructors now return the new `PyCaseBuilder` form.
     * `src/record_batch.rs`
   
       * `PyRecordBatchStream` methods `next` / `__next__` no longer require 
`&mut self`.
   
   * **Behavioral changes within implementation**:
   
     * Methods that previously required `&mut self` (forcing PyO3 to give 
`PyRefMut`) now accept `&self` and use interior mutability for any necessary 
state changes.
     * Care is taken in `PyCaseBuilder` to preserve the builder when an 
operation returns an error so that subsequent calls behave predictably.
   
   Files changed (non-exhaustive):
   
   * `Cargo.toml`, `Cargo.lock`
   * `python/tests/test_concurrency.py` (new)
   * `python/tests/test_expr.py` (tests added)
   * `src/common/schema.rs`
   * `src/config.rs`
   * `src/dataframe.rs`
   * `src/expr/conditional_expr.rs`
   * `src/expr/literal.rs`
   * `src/functions.rs`
   * `src/record_batch.rs`
   
   ## Are these changes tested?
   
   Yes — this PR adds Python unit tests that exercise the concurrent access 
patterns which previously triggered the runtime borrow errors. Tests included:
   
   * `python/tests/test_concurrency.py` — exercise SqlSchema, Config and 
DataFrame concurrently; exercise re-use of `CaseBuilder` across threads.
   * Additions to `python/tests/test_expr.py` — check `CaseBuilder` preserves 
state on error and success and that `when`-handles are independent.
   
   ## Are there any user-facing changes?
   
   * **Behavior:** There are no intended user-facing API changes. Python 
callers will observe the same high-level behaviour; the primary change is that 
these wrappers can now be safely shared across threads or re-used in contexts 
that previously triggered `Already borrowed` errors.
   * **Backward compatibility:** No breaking changes to public Python APIs are 
intended. Marking `#[pyclass(frozen)]` plus using getters/setters and interior 
mutability means Python-level attribute assignment on internal fields is not 
supported — previously Python code likely could not safely rely on direct field 
mutation anyway, and public getters/setters preserve the supported surface.
   * **Performance:** Small runtime cost from locking where mutation is needed. 
`parking_lot` was chosen for its low overhead and is expected to be efficient; 
we tried to keep lock hold times minimal.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to