srkreddy1238 opened a new pull request, #13413:
URL: https://github.com/apache/tvm/pull/13413
OpenCL supports device memory access to host by memory mapping. OpenCL flag
"CL_MEM_ALLOC_HOST_PTR" enable this while creating a memory object.
We enable this feature via compilation setting "USE_OPENCL_ENABLE_HOST_PTR"
followed by a new API "GetNativePtr" on DeviceAPI followed by NDArray class.
This allows application directly use hardware allocated memory while
preparing the input. From user side we allocate NDArray which same size as
graph input, access native memory and finally call set_input_zero_copy to set
the input.
Psudo code looks like
auto narr = tvm::runtime::NDArray::Empty(shape, {kDLFloat, 32, 1},
{kDLOpenCL, 0}); void * nptr = narr.GetNativePtr();
... access memory pointed by nptr up to the tensor size ...
tvm::runtime::PackedFunc set_input = mod.GetFunction("set_input_zero_copy");
set_input(i, ninput);
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]