cyx-6 opened a new pull request, #593:
URL: https://github.com/apache/tvm-ffi/pull/593

   ## Summary
   
   Make `a.x is a.x` and `id(a.x)` stable in Python by attaching every Python 
wrapper to its underlying C++ object via a 32-byte `PyCustomizeAllocHeader` 
prepended to every Object allocation. Reported by Junru Shao as the largest 
source of agent / OAI-monorepo migration failures. Implements the design in 
Tianqi Chen's *PyObjectTying* doc (2026-05-01).
   
   Before:
   ```python
   a = MyClass(Inner(...))
   assert a.x is a.x          # False — fresh wrapper per attribute access
   assert id(a.x) == id(a.x)  # flaky
   ```
   
   After: both hold, and identity is preserved across a wrapper 
finalize-and-revive cycle whenever the C++ object outlives the wrapper.
   
   ## Design
   
   The prepended header has a three-state machine:
   
   | State | `py_obj` | `cached_mem` | Meaning |
   |-------|----------|--------------|---------|
   | (a)   | NULL     | NULL         | C++ obj exists; never wrapped |
   | (b)   | non-NULL | non-NULL     | Live Python wrapper |
   | (c)   | NULL     | non-NULL     | Wrapper finalized; memory cached |
   
   ```
   malloc start
   +------------------------------+--------+----------------------+
   | PyCustomizeAllocHeader       | <pad>  | Object (T)           |
   |   void*  py_obj              |        |  TVMFFIObject header_|
   |   void*  cached_mem          |        |   ...                |
   |   void(*)(void*) free_cb     |        |                      |
   +------------------------------+--------+----------------------+
                                           ^ T*, Object*, TVMFFIObject*
   ```
   
   `kPyHeaderOffset = 32` bytes on x86_64 (rounded up to 
`alignof(max_align_t)`). Header recovery is `(char*)tptr - kPyHeaderOffset`.
   
   ## Implementation
   
   - **Allocator funnel** — `SimpleObjAllocator::Handler<T>::New` and the array 
variant in `include/tvm/ffi/memory.h` allocate `kPyHeaderOffset + sizeof(T)`, 
zero-init the header, place T at the offset. The Weak-branch deleter calls 
`free_cb(cached_mem)` if set, then frees the whole block.
   - **Python-class bypass sites** — `PyClassDeleter`, `__ffi_new__`, and 
`__ffi_shallow_copy__` in `src/ffi/extra/dataclass.cc` were patched in parallel 
to keep the layout uniform across all Object allocations.
   - **State-(c) tp_finalize hook** — `def __del__` on `CObject` (object.pxi) 
is mapped to `tp_finalize` (PEP 442). On wrapper death with C++ strong-count > 
1, it clears `h.py_obj`, DecRefs the chandle, and `Py_INCREF(self)` to 
resurrect — CPython then skips `tp_dealloc` so the wrapper memory survives in 
place.
   - **CYTHON_USE_TP_FINALIZE=1 under USE_SABI** — Cython 3.x disables this for 
limited-API targets because `PyObject_CallFinalizerFromDealloc` was added to 
the limited API only in 3.13. A 16-line shim in `tvm_ffi_python_helpers.h` 
reimplements it from limited-ABI primitives (`PyType_GetSlot`, `Py_SET_REFCNT`, 
`Py_REFCNT`). Comment in CMakeLists.txt explains the dependency.
   - **Cache opt-in per call site** — `make_ret` gains a `cache_lookup` 
parameter. Default False. `FieldGetter` (attribute access) sets True. FFI 
return values, callback arg unpacking, and rvalue-ref paths use 
`make_ret_object_no_cache` so callbacks see wrappers distinct from the caller, 
preserving classical move/refcount semantics.
   - **RValueRef path** — `TVMFFIPyArgSetterObjectRValueRef_` eagerly clears 
the source's header binding via `_detach_chandle_binding` before the C++ side 
nulls `chandle`. Without this, the deleter would later call `PyObject_GC_Del` 
on a still-live, still-referenced Python wrapper. Regression test included.
   - **PyNativeObject exemption** — `String`, `Bytes` discard the transient 
`Object` wrapper after construction; no header binding is installed and no 
identity stability is needed.
   
   ## Test plan
   
   - [x] Full Python suite (`uv run pytest tests/python`) passes
   - [x] New `tests/python/test_pyobject_tying.py` covers:
     - State (b): `a.x is a.x`, stable `id(a.x)` over 100 accesses, distinct 
wrappers for distinct chandles
     - State (b→c→b) revive: 2000-cycle stress test asserts exactly one wrapper 
address; chandle preserved across revive
     - State (c) skipped when wrapper holds last C++ ref (regression for 
clean-free use-after-free)
     - RValueRef move (regression for the original `test_rvalue_ref` blocker)
     - Pickle round-trip preserves attribute identity
     - PyNativeObject (String/Bytes) exempt — no state (c) accumulation
     - Type-mismatch fallthrough on revive
     - Field setter after revive (mutable dataclass)
   - [ ] CI: lint, clang-tidy, doc, C++/Python/Rust on Linux x86_64 + aarch64, 
macOS arm64, Windows AMD64
   
   ## Out-of-scope follow-ups
   
   Documented in the design doc, not in this PR:
   
   - **Per-type opt-out** for hot small types via a `kSkipPyHeader` trait
   - **Header compression to 2 pointers** by replacing per-instance `free_cb` 
with a global module-init callback
   - **Free-threaded CPython 3.13t+** — replace GIL assumption with CAS on 
`h->py_obj` for the fast path


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to