[GitHub] [tvm] echuraev commented on a diff in pull request #14299: [OpenCL][Adreno] Introduction of adreno recordable queues feature

via GitHub Tue, 14 Mar 2023 22:35:50 -0700


echuraev commented on code in PR #14299:
URL: https://github.com/apache/tvm/pull/14299#discussion_r1136520765



##########
3rdparty/cutlass:
##########


Review Comment:
   Did you change something in this submodule?



##########
adreno_recording/adreno_recording_executor.py:
##########
@@ -0,0 +1,122 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import tvm._ffi
+
+from tvm._ffi.base import string_types
+from tvm.contrib import graph_executor
+
+
+def create(graph_json_str, libmod, device):
+    """Create a runtime executor module given a graph and module.
+
+    Parameters
+    ----------
+    graph_json_str : str
+        The graph to be deployed in json format output by json graph.
+        The graph can contain operator(tvm_op) that points to the name
+        of PackedFunc in the libmod.
+
+    libmod : tvm.runtime.Module
+        The module of the corresponding function
+
+    device : Device
+        The device to deploy the module, only supports Adreno GPU
+
+    Returns
+    -------
+    graph_module : GraphModuleAdrenoRecording
+        Adreno recording graph executor module that can be used to execute the 
graph.
+
+    """
+    assert isinstance(graph_json_str, string_types)
+    try:
+        dev, num_rpc_dev, device_type_id = graph_executor.get_device(libmod, 
device)
+        if num_rpc_dev == len(dev):
+            fcreate = 
dev[0]._rpc_sess.get_function("tvm.graph_executor_adreno_recording.create")
+        else:
+            fcreate = 
tvm._ffi.get_global_func("tvm.graph_executor_adreno_recording.create")
+    except ValueError:
+        raise ValueError(
+            "To enable adreno recording, please set "
+            "'(USE_ADRENO_RECORDING ON)' in config.cmake and rebuild TVM"
+        )
+
+    return GraphModuleAdrenoRecording(fcreate(graph_json_str, libmod, 
*device_type_id))
+
+
+class GraphModuleAdrenoRecording(graph_executor.GraphModule):
+    """Adreno recording graph executor module.
+
+    This is a adreno recording graph executor wrapper over the TVM runtime.
+    Runtime interfaces are wrapped with adreno recording functionalities.
+
+    Parameters
+    ----------
+    module : Module
+        The internal tvm module that holds the actual graph functions.
+    """
+    def __init__(self, module):
+        self._start_capture = module["start_capture"]
+        self._end_capture = module["end_capture"]
+        self._run_recording = module["run_graph"]
+        self._cuda_graph_captured = False
+        graph_executor.GraphModule.__init__(self, module)
+
+    def capture_graph(self):
+        """
+        Capture tvm_op graph to store in the recording
+        """
+        self._start_capture()
+        self._run()
+        self._end_capture()
+        self._graph_captured = True
+
+    def run_recording(self):
+        """
+        Run recorded operations
+        """
+        self._run_recording()
+
+    def run(self, **input_dict):
+        """A run wrapper for graph capture / launch, user can just
+        change default graph executor to adrebo recording graph executor, and
+        the first call will capture a recording for future launch
+
+        Parameters
+        ----------
+        input_dict: dict of str to NDArray
+            List of input values to be feed to
+        """
+        if input_dict:
+            self.set_input(**input_dict)
+        if not self._graph_captured:
+            self.capture_graph()
+        else:
+            self._run_recording()
+
+    def debug_get_output(self, node, out):
+        """Run graph up to node and get the output to out
+
+        Parameters
+        ----------
+        node : int / str
+            The node index or name
+
+        out : NDArray
+            The output array container
+        """
+        raise NotImplementedError("Please use debugger.debug_executor as 
graph_executor instead.")

Review Comment:
   Probably class `graph_executor.GraphModule` has the same implementation and 
this method will be inherited.



##########
src/runtime/opencl/opencl_common.h:
##########
@@ -264,6 +272,23 @@ class OpenCLWorkspace : public DeviceAPI {
         << "Invalid OpenCL device_id=" << dev.device_id << ". " << GetError();
     return queues[dev.device_id];
   }
+  cl_command_queue GetRecQueue(Device dev) {
+    ICHECK(IsOpenCLDevice(dev));
+    this->Init();
+    ICHECK(dev.device_id >= 0 && static_cast<size_t>(dev.device_id) < 
rec_queues.size())
+        << "Invalid OpenCL device_id=" << dev.device_id << ". " << GetError();
+    return rec_queues[dev.device_id];

Review Comment:
   ```suggestion
       return rec_queues[GetCLDeviceID(dev.device_id)];
   ```



##########
adreno_recording/__init__.py:
##########
@@ -0,0 +1,16 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.

Review Comment:
   Probably you should add import of `adreno_recording_executor.py` here.



##########
adreno_recording/adreno_recording_executor.py:
##########
@@ -0,0 +1,122 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import tvm._ffi
+
+from tvm._ffi.base import string_types
+from tvm.contrib import graph_executor
+
+
+def create(graph_json_str, libmod, device):
+    """Create a runtime executor module given a graph and module.
+
+    Parameters
+    ----------
+    graph_json_str : str
+        The graph to be deployed in json format output by json graph.
+        The graph can contain operator(tvm_op) that points to the name
+        of PackedFunc in the libmod.
+
+    libmod : tvm.runtime.Module
+        The module of the corresponding function
+
+    device : Device
+        The device to deploy the module, only supports Adreno GPU

Review Comment:
   Not sure that we can do it, just an idea. Can we check that this `device` 
corresponds to target `opencl --device=adreno` in this function?



##########
adreno_recording/adreno_recording_executor.py:
##########
@@ -0,0 +1,122 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import tvm._ffi
+
+from tvm._ffi.base import string_types
+from tvm.contrib import graph_executor
+
+
+def create(graph_json_str, libmod, device):
+    """Create a runtime executor module given a graph and module.
+
+    Parameters
+    ----------
+    graph_json_str : str
+        The graph to be deployed in json format output by json graph.
+        The graph can contain operator(tvm_op) that points to the name
+        of PackedFunc in the libmod.
+
+    libmod : tvm.runtime.Module
+        The module of the corresponding function
+
+    device : Device
+        The device to deploy the module, only supports Adreno GPU
+
+    Returns
+    -------
+    graph_module : GraphModuleAdrenoRecording
+        Adreno recording graph executor module that can be used to execute the 
graph.
+
+    """
+    assert isinstance(graph_json_str, string_types)
+    try:
+        dev, num_rpc_dev, device_type_id = graph_executor.get_device(libmod, 
device)
+        if num_rpc_dev == len(dev):
+            fcreate = 
dev[0]._rpc_sess.get_function("tvm.graph_executor_adreno_recording.create")
+        else:
+            fcreate = 
tvm._ffi.get_global_func("tvm.graph_executor_adreno_recording.create")
+    except ValueError:
+        raise ValueError(
+            "To enable adreno recording, please set "
+            "'(USE_ADRENO_RECORDING ON)' in config.cmake and rebuild TVM"
+        )
+
+    return GraphModuleAdrenoRecording(fcreate(graph_json_str, libmod, 
*device_type_id))
+
+
+class GraphModuleAdrenoRecording(graph_executor.GraphModule):
+    """Adreno recording graph executor module.
+
+    This is a adreno recording graph executor wrapper over the TVM runtime.
+    Runtime interfaces are wrapped with adreno recording functionalities.
+
+    Parameters
+    ----------
+    module : Module
+        The internal tvm module that holds the actual graph functions.
+    """
+    def __init__(self, module):
+        self._start_capture = module["start_capture"]
+        self._end_capture = module["end_capture"]
+        self._run_recording = module["run_graph"]
+        self._cuda_graph_captured = False

Review Comment:
   Cuda? Looking at this class implementation, it should be `_graph_captured`?



##########
src/runtime/opencl/opencl_common.h:
##########
@@ -264,6 +272,23 @@ class OpenCLWorkspace : public DeviceAPI {
         << "Invalid OpenCL device_id=" << dev.device_id << ". " << GetError();
     return queues[dev.device_id];
   }
+  cl_command_queue GetRecQueue(Device dev) {

Review Comment:
   This method should be also located under `#ifdef USE_ADRENO_RECORDING`?



##########
adreno_recording/adreno_recording_executor.py:
##########
@@ -0,0 +1,122 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import tvm._ffi
+
+from tvm._ffi.base import string_types
+from tvm.contrib import graph_executor
+
+
+def create(graph_json_str, libmod, device):
+    """Create a runtime executor module given a graph and module.
+
+    Parameters
+    ----------
+    graph_json_str : str
+        The graph to be deployed in json format output by json graph.
+        The graph can contain operator(tvm_op) that points to the name
+        of PackedFunc in the libmod.
+
+    libmod : tvm.runtime.Module
+        The module of the corresponding function
+
+    device : Device
+        The device to deploy the module, only supports Adreno GPU
+
+    Returns
+    -------
+    graph_module : GraphModuleAdrenoRecording
+        Adreno recording graph executor module that can be used to execute the 
graph.
+
+    """
+    assert isinstance(graph_json_str, string_types)
+    try:
+        dev, num_rpc_dev, device_type_id = graph_executor.get_device(libmod, 
device)
+        if num_rpc_dev == len(dev):
+            fcreate = 
dev[0]._rpc_sess.get_function("tvm.graph_executor_adreno_recording.create")
+        else:
+            fcreate = 
tvm._ffi.get_global_func("tvm.graph_executor_adreno_recording.create")
+    except ValueError:
+        raise ValueError(
+            "To enable adreno recording, please set "
+            "'(USE_ADRENO_RECORDING ON)' in config.cmake and rebuild TVM"
+        )
+
+    return GraphModuleAdrenoRecording(fcreate(graph_json_str, libmod, 
*device_type_id))
+
+
+class GraphModuleAdrenoRecording(graph_executor.GraphModule):
+    """Adreno recording graph executor module.
+
+    This is a adreno recording graph executor wrapper over the TVM runtime.

Review Comment:
   ```suggestion
       This is an adreno recording graph executor wrapper over the TVM runtime.
   ```



##########
src/relay/transforms/annotate_texture_storage.cc:
##########
@@ -532,7 +533,11 @@ class RewriteVDStorageScopes : public 
transform::DeviceAwareExprMutator {
         VirtualDevice virtual_device_to =
             VirtualDevice(virtual_device->device_type(), 
virtual_device->virtual_device_id,
                           virtual_device->target, 
storage_scope_[arg][GetRef<Expr>(call_node)][0]);
-        new_arg = DeviceCopy(new_arg, virtual_device_from, virtual_device_to);
+        auto data_layout = 
call_node->op.as<FunctionNode>()->attrs.GetAttr<String>("data_layout").value();
+        if (data_layout.empty()) {
+          data_layout = 
call_node->op.as<FunctionNode>()->attrs.GetAttr<String>("layout").value();
+        }
+        new_arg = tvm::relay::MakeLayoutTransform(new_arg, data_layout, 
data_layout);

Review Comment:
   Did you do any performance measurements of this approach in comparison with 
`DeviceCopy`?



##########
src/runtime/opencl/opencl_common.h:
##########
@@ -50,14 +50,18 @@
  * files.  This also allows us to expose the OpenCL version through
  * tvm.runtime.Device.
  */
-#define CL_TARGET_OPENCL_VERSION 120
+#define CL_TARGET_OPENCL_VERSION 220

Review Comment:
   Emm... Are you sure that we want to up OpenCL version for all devices? I'm 
afraid that we shouldn't do it.
   If you wish, you can use `ifdef USE_ADRENO_RECORDING` or something like that 
for this purpose:
   ```c++
   #ifdef USE_ADRENO_RECORDING
   #define CL_TARGET_OPENCL_VERSION 220
   #else
   #define CL_TARGET_OPENCL_VERSION 120
   #endif
   ```



##########
src/runtime/opencl/opencl_module.cc:
##########
@@ -274,6 +281,49 @@ cl_kernel 
OpenCLModuleNode::InstallKernel(cl::OpenCLWorkspace* w, cl::OpenCLThre
   return kernel;
 }
 
+void OpenCLModuleNode::StartRecording() {
+#ifdef USE_ADRENO_RECORDING
+  cl_int err_code;
+  cl_command_queue rec_que;
+  auto w_ = cl::OpenCLWorkspace::Global();
+  for (size_t i = 0; i < w_->devices.size(); ++i) {
+    cl_device_id did = w_->devices[i];
+    rec_que = clCreateCommandQueue(w_->context, did, CL_QUEUE_RECORDABLE_QCOM, 
&err_code);
+    w_->rec_queues.push_back(rec_que);
+    OPENCL_CHECK_ERROR(err_code);
+    w_->recordings.push_back(clNewRecordingQCOM(rec_que, &err_code));
+    OPENCL_CHECK_ERROR(err_code);
+  }
+
+  this->is_recording = true;
+#endif
+}
+
+void OpenCLModuleNode::EndRecording() {
+#ifdef USE_ADRENO_RECORDING
+  auto w_ = cl::OpenCLWorkspace::Global();
+  cl::OpenCLThreadEntry* t = w_->GetThreadEntry();
+  cl_recording_qcom recording = w_->GetRecording(t->device);
+  OPENCL_CALL(clEndRecordingQCOM(recording));
+#endif
+}
+
+void OpenCLModuleNode::RunRecording() {
+#ifdef USE_ADRENO_RECORDING
+  auto w_ = cl::OpenCLWorkspace::Global();
+  cl::OpenCLThreadEntry* t = w_->GetThreadEntry();
+  cl_command_queue queue = w_->GetQueue(t->device);
+  cl_recording_qcom recording = w_->GetRecording(t->device);
+  if (w_->IsProfiling(t->device)) {
+    w_->GetEventQueue(t->device).resize(w_->GetEventQueue(t->device).size() + 
1);
+    OPENCL_CALL(clEnqueueRecordingQCOM(queue, recording, 0, nullptr, 0, 
nullptr, 0,
+                                  nullptr, 0, nullptr, 0, nullptr, 
&(w_->GetEventQueue(t->device).back())));
+  }
+  else {
+    OPENCL_CALL(clEnqueueRecordingQCOM(queue, recording, 0, nullptr, 0, 
nullptr, 0,
+                                      nullptr, 0, nullptr, 0, nullptr, 
nullptr));
+  }
+#endif

Review Comment:
   Not sure that it will be compiled. Looks like you missed the end bracket `}`.



##########
src/runtime/opencl/opencl_common.h:
##########
@@ -440,9 +465,13 @@ class OpenCLModuleNode : public ModuleNode {
   // install a new kernel to thread local entry
   cl_kernel InstallKernel(cl::OpenCLWorkspace* w, cl::OpenCLThreadEntry* t,
                           const std::string& func_name, const KTRefEntry& e);
+  void StartRecording();
+  void EndRecording();
+  void RunRecording();
   void SetPreCompiledPrograms(const std::string& bytes);
   std::string GetPreCompiledPrograms();
 
+  bool is_recording;

Review Comment:
   Move this variable below and add a comment with description.



##########
test_adreno_recording.py:
##########
@@ -0,0 +1,105 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import json
+import os
+import re
+import sys
+import time
+
+import pytest
+
+import tvm
+import tvm.testing
+from tvm import te
+import numpy as np
+from tvm import relay
+
+from tvm.contrib import utils, ndk 
+from tvm.contrib.adreno_recording import adreno_recording_executor as 
graph_runtime
+
+from utils.adreno_utils import get_cpu_reference, gpu_preprocess
+
+dtype = tvm.testing.parameter("float32")
+
+# @tvm.testing.adrenorecording

Review Comment:
   Should we uncomment this decarator?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] echuraev commented on a diff in pull request #14299: [OpenCL][Adreno] Introduction of adreno recordable queues feature

Reply via email to