[GitHub] [arrow] oleksandr-pavlyk commented on pull request #34972: GH-34971: [Format] Enhance C-Data API to support non-cpu cases

via GitHub Mon, 24 Apr 2023 13:42:47 -0700


oleksandr-pavlyk commented on PR #34972:
URL: https://github.com/apache/arrow/pull/34972#issuecomment-1520799743

I assume we are talking about USM-based chunks. Dereferencing a USM pointer,
generically, is only safe in kernels executed on the same device and using the
same context that were used to perform the allocation. USM pointers of certain
type (USM-shared and USM-host) can also be accessed from host. Deallocation of
USM allocations requires a context.

SYCL provides a mechanism to recover the device where the allocation was
made from the pointer and the context (see `sycl::get_pointer_device`, [sec.
4.8.4](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_unified_shared_memory_pointer_queries)).
Therefore, a USM allocation should be thought of as a tuple of a pointer and
the associated SYCL context.

To share USM allocations across libraries (modules) in the same process
space, the same context must be used. Two context instances constructed from
the same arguments (either SYCL device, or device selector callable) are going
to be different.

To facilitate sharing USM allocations Intel has proposed an
[extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_oneapi_default_context.asciidoc)
and implemented it in oneAPI DPC++ to define a canonical default context for
each SYCL platform.

This extension allows to use USM pointer and a SYCL device to describe USM
allocations bound to the default platform context.
The SYCL device in this tuple can be represented by an integer, which is the
position of the device in the vector of all devices known to SYCL runtime,
obtained using `sycl::device::get_devices()`. This position is guaranteed to be
stable (same for different modules in a process, and the same between process
runs).

Hence USM allocations (only if bound to the default platform context) can be
shared using DLPack and represented in DLPack struct using a pointer and a
integral device identifier. (DLPack support for USM allocations is implemented
in
[IntelPython/dpctl](https://github.com/IntelPython/dpctl/blob/master/dpctl/tensor/_dlpack.pyx)).

I would therefore not recommend deleting information to reconstruct context
from `ArrowDeviceArray` unless a Arrow were to restrict itself to heterogeneous
systems in which SYCL only sees a single platform (i.e. only CUDA devices, only
Level-Zero devices, or only HIP devices).

To support sharing of USM allocations which may still be due to be accessed
by scheduled, but as yet unfinished kernels, one can use SYCL events (one may
need to support providing more than one events). Consumer would then gate
kernels accessing shared allocations to avoid race conditions.

@tchen suggested in
https://github.com/dmlc/dlpack/issues/57#issuecomment-753696812 that instead of
using events, a consumer would give a process a stream (or `sycl::queue`) that
it would use to process shared USM allocation and the produce would insert a
barrier gated by the events from submitted tasks that may yet modified the
shared allocation.

Support for such barriers is yet to make it into the SYCL standard (see
[this oneAPI
extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_oneapi_enqueue_barrier.asciidoc)
which is implemented in oneAPI DPC++ compiler). Such mechanism is implemented
in IntelPython/dpctl (see
[here](https://github.com/IntelPython/dpctl/blob/master/dpctl/tensor/_usmarray.pyx#L934-L935)).

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] oleksandr-pavlyk commented on pull request #34972: GH-34971: [Format] Enhance C-Data API to support non-cpu cases

Reply via email to