Hello! I wrote an op composed of four CUDA kernels, and now I want to optimize the op, so I need to know the time ratio of the four kernels. I tried nvprof but was unable to use it due to permission issues. Is there a similar test function in TVM? My current test code is as follows:
module = graph_runtime.create(graph, lib, ctx) data_tvm = tvm.nd.array((np.random.uniform(size=input_shape)).astype("float16")) module.set_input('data', data_tvm) module.set_input(**params) module.run() --- [Visit Topic](https://discuss.tvm.ai/t/how-do-you-test-the-percentage-of-time-spent-on-several-cuda-kernels/6279/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/496ad6440508750c65564844545d4753bd664a65979c7bd7d998c0044180b495).