srkreddy1238 commented on a change in pull request #52:
URL: https://github.com/apache/tvm-rfcs/pull/52#discussion_r787273682



##########
File path: rfcs/0052-OpenCLML-integratio-as-BYOC.md
##########
@@ -0,0 +1,90 @@
+- Feature Name: OpenCL ML integration as BYOC
+- Start Date: 2022-01-13
+- RFC PR: [apache/tvm-rfcs#52](https://github.com/apache/tvm-rfcs/pull/52)
+- GitHub Issue: TBD
+
+
+# Summary
+[summary]: #summary
+
+OpenCL ML is an extension (cl_qcom_ml_ops) over OpenCL spec developed by 
Qualcomm to accelerate the machine learning at operation level. OpenCL SDK is 
publicly available at OpenCL Machine Learning Acceleration on Adreno GPU - 
Qualcomm Developer Network. OpenCL ML leverages deep knowledge of Adreno GPU 
for significant performance benefits. It offers C based DNN API with 
compatibility to most of the standard frameworks. Its standard OpenCL features 
like command queues, buffers, events and supports FP16 and FP32 data types. 
CLML API calls can be interleaved with other OpenCL kernels (i.e., TVM 
generated kernels) and dispatched to the same command queue. This extension is 
compatible with existing OpenCL extensions for importing memory, controlling 
performance and data access.
+
+# Motivation
+[motivation]: #motivation
+
+The current OpenCL backend of TVM is very generic and not optimized well for 
Adreno performance capabilities. Adreno GPU has quite a few proprietary and 
standard OpenCL paths. OpenCL ML extension offers accelerated ML operations via 
an SDK interface.
+
+With TVM having the entire framework of frontends, graph level optimizations 
and OpenCL ML having kernels that perform best on Adreno GPU, in this work we 
aim to integrate OpenCLML SDK into TVM as a BYOC. This effort brings best of 
both worlds where TVM handling high level optimizations, sub graphs are 
scheduled on OpenCL ML based on the support and the operators not supported by 
OpenCL ML will take TVM’s default OpenCL path. Good thing here is we don’t need 
separate OpenCL workspaces or command queues for both paths, instead they can 
share the command queues. Also, data (DLTensor) transfer across subgraphs is 
seamless with OpenCL ML API’s.

Review comment:
       There is  an opencl work space (context & command queue) already created 
by TVM's default workspace, For CLML runtime we don't need to create a new 
opencl context and queue. The same will be reused by accessing it through 
```tvm::runtime::Registry::Get("device_api.opencl");```. Advantage here would 
be the memory objects (clBuffers ...etc) created under default TVM's OpenCL 
device API and CLML API will be under same context.
   
   CL memory objects under same context allows us to use CL 
(```clEnqueueCopyBuffer```) / CLML API (```clEnqueueCopyMLTensorDataQCOM```) 
for data copy across sub graph boundary.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to