Hello!
I wrote an op composed of four CUDA kernels, and now I want to optimize the op, 
so I need to know the time ratio of the four kernels.
I tried nvprof but was unable to use it due to permission issues.
Is there a similar test function in TVM?
My current test code is as follows:

        module = graph_runtime.create(graph, lib, ctx)
        data_tvm = 
tvm.nd.array((np.random.uniform(size=input_shape)).astype("float16"))
        module.set_input('data', data_tvm)
        module.set_input(**params)
        module.run()





---
[Visit 
Topic](https://discuss.tvm.ai/t/how-do-you-test-the-percentage-of-time-spent-on-several-cuda-kernels/6279/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/496ad6440508750c65564844545d4753bd664a65979c7bd7d998c0044180b495).

Reply via email to