apstenku123 opened a new pull request, #19504:
URL: https://github.com/apache/tvm/pull/19504
## Summary
Adds an opt-in environment variable `TVM_METAL_STORAGE_MODE` that lets users
allocate device data buffers as `MTLResourceStorageModeShared` (or `Managed`)
instead of the default `MTLResourceStorageModePrivate`. Default behaviour is
unchanged.
| value | mode | semantics
|
| ----------------- | --------------------------------------- |
------------------------------------------------------ |
| unset / `private` | `MTLResourceStorageModePrivate` | default,
GPU-only, preserves historical behaviour |
| `shared` | `MTLResourceStorageModeShared` | CPU+GPU
mapped — required for zero-copy DLPack to MLX |
| `managed` | `MTLResourceStorageModeManaged` | macOS-only
intermediate (driver tracks dirty pages) |
| anything else | `MTLResourceStorageModePrivate` + warn | safe
fall-back |
The env var is read once on first `MetalWorkspace::AllocDataSpace` and
cached for the lifetime of the process; no per-allocation overhead. A new FFI
helper `metal.GetStorageMode` is registered alongside the existing
`metal.GetProfileCounters` / `metal.ResetProfileCounters` helpers so tests can
verify the resolved mode without an ObjC bridge.
The staging-buffer pool (`metal_common.h:383`) and temp-buffer pool
(`metal_device_api.mm:374`) already use `MTLStorageModeShared` and are
intentionally untouched — they're host-staging by design and don't fall under
the data-space allocator.
## Why
TVM's Metal device API has always allocated `MTLBuffer` with
`MTLResourceStorageModePrivate`. This is the right choice for pure-GPU
workloads (no CPU page mapping), but it blocks zero-copy DLPack interop with
other Metal-using frameworks that allocate Shared/Managed buffers — notably
`ml-explore/mlx`, which uses `MTLResourceStorageModeShared` everywhere. Two
allocators on the same `MTLDevice` produce buffers with different page-mapping
semantics; DLPack capsules from TVM cannot be consumed by `mx.array`
(live-tested: `std::bad_cast` on `mx.array(tvm_metal_capsule)`).
This change unblocks the bridge from TVM-NDArray to `mlx.array` (both wrap
`MTLBuffer`; require matching storage mode for the same foreign capsule to be
consumable). It is the producer half of a pair; the consumer half is a parallel
ml-explore/mlx PR that adds `mx.from_dlpack(obj)`.
## Test plan
- [ ] `xcrun --sdk macosx clang++ -std=c++17 -framework Metal
syntax_check.mm -o syntax_check && ./syntax_check` — exercises env-var parsing
for all 6 cases (unset, shared, mixed-case Shared, invalid, managed, private).
- [ ] Build runtime: `mkdir build && cd build && cmake -DUSE_METAL=ON
-DUSE_LLVM=ON -DCMAKE_BUILD_TYPE=Release .. && make -j tvm_runtime`
- [ ] `./runtime_check` (TVM-linked probe) — validates that the env var
flows to a real `MTLBuffer.storageMode`. Live captured 2026-05-03 on Apple M4
Max for unset/shared/managed/private.
- [ ] `TVM_METAL_STORAGE_MODE=shared python -c "import tvm; arr =
tvm.nd.empty((4,), dtype='float32', device=tvm.metal()); print(arr.shape)"`
- [ ] CI: macos-arm64 runner in apache/tvm should exercise the existing
Metal tests; default behaviour (env unset) is unchanged.
## Caveats / non-goals
- This is a **copy-elision interop patch**, not a kernel-speed patch.
Default Private mode remains the right choice for TVM-only workloads.
- The patch artifact only changes `src/runtime/metal/metal_device_api.mm`;
it does not yet add an upstream `tests/python/runtime/...` file. A
subprocess-isolated Python test for the env-cache behaviour can be folded in if
maintainers want it in tree.
- Local Metal microbenchmarks on Apple M4 Max show Shared buffers remove the
staging-buffer + blit/wait cost at CPU↔Metal transfer boundaries (e.g., 1 MiB
CPU→Metal median 138.375 µs Private vs 12.750 µs Shared in a downstream probe).
These numbers are local-health checks, not in-tree benchmarks.
## Pairing
Paired upstream patch: ml-explore/mlx adds `mx.from_dlpack(obj)` Metal-aware
consumer (filed in parallel). Both patches must land for the zero-copy MLX↔TVM
use case to work end-to-end.
## Attribution
Co-developed with `cppmega.mlx` for Apple-Silicon Metal interop with MLX.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]