areusch commented on a change in pull request #7472:
URL: https://github.com/apache/tvm/pull/7472#discussion_r580511965



##########
File path: src/runtime/vm/profiler/vm.h
##########
@@ -51,7 +51,7 @@ class VirtualMachineDebug : public VirtualMachine {
                     const std::vector<ObjectRef>& args) final;
 
   std::unordered_map<Index, std::string> packed_index_map_;
-  std::unordered_map<Index, std::vector<double>> op_durations_;
+  std::unordered_map<Index, std::vector<TypedPackedFunc<int64_t()>>> 
op_durations_;

Review comment:
       this isn't really durations anymore, is it?

##########
File path: tests/python/unittest/test_runtime_profiling.py
##########
@@ -0,0 +1,26 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import tvm
+import time
+
+
[email protected]_targets
+def test_cpu_timer(target, ctx):
+    timer_stop = tvm.runtime.start_timer(ctx)
+    time.sleep(1e-3)
+    nanosecs = timer_stop()
+    assert nanosecs < 2e6 and nanosecs > 5e5

Review comment:
       I don't think we should do this, it's inviting a flaky test. do we have 
a mock DLContext?

##########
File path: python/tvm/runtime/profiling.py
##########
@@ -0,0 +1,57 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Runtime profiling functions."""
+import tvm._ffi
+
+
+def start_timer(ctx):
+    """
+    Start a low-overhead device specific timer.
+
+    Params
+    ------
+    ctx: TVMContext
+        The context to get a timer for.
+
+    Returns
+    -------
+    timer_start: function
+        A function that stops the device specific timer. Calling this function
+        stops the timer and returns another function that returns the elapsed
+        time in nanoseconds. This third function should be called as late as

Review comment:
       also, what will happen if a wraparound occurs? it would be great to 
describe it in this api doc

##########
File path: python/tvm/runtime/profiling.py
##########
@@ -0,0 +1,57 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Runtime profiling functions."""
+import tvm._ffi
+
+
+def start_timer(ctx):
+    """
+    Start a low-overhead device specific timer.
+
+    Params
+    ------
+    ctx: TVMContext
+        The context to get a timer for.
+
+    Returns
+    -------
+    timer_start: function
+        A function that stops the device specific timer. Calling this function
+        stops the timer and returns another function that returns the elapsed
+        time in nanoseconds. This third function should be called as late as

Review comment:
       explain the data type of the return value. if it is floating point, why 
not seconds?

##########
File path: src/runtime/rocm/rocm_device_api.cc
##########
@@ -200,5 +200,28 @@ TVM_REGISTER_GLOBAL("device_api.rocm").set_body([](TVMArgs 
args, TVMRetValue* rv
   DeviceAPI* ptr = ROCMDeviceAPI::Global();
   *rv = static_cast<void*>(ptr);
 });
+
+TVM_REGISTER_GLOBAL("profiling.timer.rocm").set_body_typed([](TVMContext ctx) {
+  hipEvent_t start;

Review comment:
       is it safe to copy these?

##########
File path: src/runtime/cuda/cuda_device_api.cc
##########
@@ -241,5 +242,28 @@ 
TVM_REGISTER_GLOBAL("device_api.cpu_pinned").set_body([](TVMArgs args, TVMRetVal
   *rv = static_cast<void*>(ptr);
 });
 
+TVM_REGISTER_GLOBAL("profiling.timer.gpu").set_body_typed([](TVMContext ctx) {
+  cudaEvent_t start;
+  cudaEvent_t stop;
+  cudaEventCreate(&start);
+  cudaEventCreate(&stop);
+  cudaEventRecord(start, CUDAThreadEntry::ThreadLocal()->stream);
+  return TypedPackedFunc<TypedPackedFunc<int64_t()>()>(
+      [=]() {
+        cudaEventRecord(stop, CUDAThreadEntry::ThreadLocal()->stream);
+        return TypedPackedFunc<int64_t()>(
+            [=]() -> int64_t {
+              cudaEventSynchronize(stop);
+              float milliseconds = 0;

Review comment:
       how does it work accessing `stop` from here when its copy was written to 
above? how might this work as a general mechanism? this does imply that 
starting and stopping a timer may load the dynamic memory allocator, which is 
bad for general timing API.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to