srkreddy1238 commented on a change in pull request #52: URL: https://github.com/apache/tvm-rfcs/pull/52#discussion_r787276809
########## File path: rfcs/0052-OpenCLML-integratio-as-BYOC.md ########## @@ -0,0 +1,90 @@ +- Feature Name: OpenCL ML integration as BYOC +- Start Date: 2022-01-13 +- RFC PR: [apache/tvm-rfcs#52](https://github.com/apache/tvm-rfcs/pull/52) +- GitHub Issue: TBD + + +# Summary +[summary]: #summary + +OpenCL ML is an extension (cl_qcom_ml_ops) over OpenCL spec developed by Qualcomm to accelerate the machine learning at operation level. OpenCL SDK is publicly available at OpenCL Machine Learning Acceleration on Adreno GPU - Qualcomm Developer Network. OpenCL ML leverages deep knowledge of Adreno GPU for significant performance benefits. It offers C based DNN API with compatibility to most of the standard frameworks. Its standard OpenCL features like command queues, buffers, events and supports FP16 and FP32 data types. CLML API calls can be interleaved with other OpenCL kernels (i.e., TVM generated kernels) and dispatched to the same command queue. This extension is compatible with existing OpenCL extensions for importing memory, controlling performance and data access. + +# Motivation +[motivation]: #motivation + +The current OpenCL backend of TVM is very generic and not optimized well for Adreno performance capabilities. Adreno GPU has quite a few proprietary and standard OpenCL paths. OpenCL ML extension offers accelerated ML operations via an SDK interface. + +With TVM having the entire framework of frontends, graph level optimizations and OpenCL ML having kernels that perform best on Adreno GPU, in this work we aim to integrate OpenCLML SDK into TVM as a BYOC. This effort brings best of both worlds where TVM handling high level optimizations, sub graphs are scheduled on OpenCL ML based on the support and the operators not supported by OpenCL ML will take TVM’s default OpenCL path. Good thing here is we don’t need separate OpenCL workspaces or command queues for both paths, instead they can share the command queues. Also, data (DLTensor) transfer across subgraphs is seamless with OpenCL ML API’s. Review comment: We know the BYOC will have sub graphs based on the operator support and there exists a data (DLTensor to/from runtime specific objects) copy while switching from one sub graph to another. This copy some time require bringing the memory to host and copy to the new runtime. In our case the OpenCL DLTensor is backed by clBuffer and CLML tensor is also backed by clBuffer. Hence, we can use OpenCL (or CLML) copy API for direct copy with in CL context. Or given the data layout is same on both sides we can make CLML Tensor use the clBuffer created by DLTensor directly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
