oleksandr-pavlyk commented on PR #34972:
URL: https://github.com/apache/arrow/pull/34972#issuecomment-1520799743

   I assume we are talking about USM-based chunks. Dereferencing a USM pointer, 
generically, is only safe in kernels executed on the same device and using the 
same context that were used to perform the allocation. USM pointers of certain 
type (USM-shared and USM-host) can also be accessed from host. Deallocation of 
USM allocations requires a context.
   
   SYCL provides a mechanism to recover the device where the allocation was 
made from the pointer and the context (see `sycl::get_pointer_device`, [sec. 
4.8.4](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_unified_shared_memory_pointer_queries)).
 Therefore, a USM allocation should be thought of as a tuple of a pointer and 
the associated SYCL context. 
   
   To share USM allocations across libraries (modules) in the same process 
space, the same context must be used. Two context instances constructed from 
the same arguments (either SYCL device, or device selector callable) are going 
to be different.
   
   To facilitate sharing USM allocations Intel has proposed an 
[extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_oneapi_default_context.asciidoc)
 and implemented it in oneAPI DPC++ to define a canonical default context for 
each SYCL platform.
   
   This extension allows to use USM pointer and a SYCL device to describe USM 
allocations bound to the default platform context. 
   The SYCL device in this tuple can be represented by an integer, which is the 
position of the device in the vector of all devices known to SYCL runtime, 
obtained using `sycl::device::get_devices()`. This position is guaranteed to be 
stable (same for different modules in a process, and the same between process 
runs).
   
   Hence USM allocations (only if bound to the default platform context) can be 
shared using DLPack and represented in DLPack struct using a pointer and a 
integral device identifier. (DLPack support for USM allocations is implemented 
in 
[IntelPython/dpctl](https://github.com/IntelPython/dpctl/blob/master/dpctl/tensor/_dlpack.pyx)).
   
   I would therefore not recommend deleting information to reconstruct context 
from `ArrowDeviceArray` unless a Arrow were to restrict itself to heterogeneous 
systems in which SYCL only sees a single platform (i.e. only CUDA devices, only 
Level-Zero devices, or only HIP devices).
   
   To support sharing of USM allocations which may still be due to be accessed 
by scheduled, but as yet unfinished kernels, one can use SYCL events (one may 
need to support providing more than one events). Consumer would then gate 
kernels accessing shared allocations to avoid race conditions.
   
   @tchen suggested in 
https://github.com/dmlc/dlpack/issues/57#issuecomment-753696812 that instead of 
using events, a consumer would give a process a stream (or `sycl::queue`) that 
it would use to process shared USM allocation and the produce would insert a 
barrier gated by the events from submitted tasks that may yet modified the 
shared allocation. 
   
   Support for such barriers is yet to make it into the SYCL standard (see 
[this oneAPI 
extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_oneapi_enqueue_barrier.asciidoc)
 which is implemented in oneAPI DPC++ compiler). Such mechanism is implemented 
in IntelPython/dpctl (see 
[here](https://github.com/IntelPython/dpctl/blob/master/dpctl/tensor/_usmarray.pyx#L934-L935)).
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to