kazum opened a new pull request #5097: [TOPI][OP] Use Thrust sort for argsort and topk URL: https://github.com/apache/incubator-tvm/pull/5097 The current GPU sort implementation (odd-even transposition sort) is too slow when the number of elements is large. This PR introduces Thrust implementation of sort which is much faster. Note that this change requires CMake 3.8 or later since we have to use nvcc to compile a thrust code. - benchmark script ```python import tvm from tvm import relay from tvm.contrib import graph_runtime import numpy as np target = 'cuda' ctx = tvm.gpu(0) n = 100000 x = relay.var("x", shape=(n,)) out = relay.topk(x) func = relay.Function([x], out[0]) with relay.build_config(opt_level=3): graph, lib, params = relay.build(func, target) module = graph_runtime.create(graph, lib, ctx) print("Evaluate inference time cost...") ftimer = module.module.time_evaluator("run", ctx, number=1, repeat=3) prof_res = np.array(ftimer().results) * 1000 print("Mean inference time (std dev): %.2f ms (%.2f ms)" % (np.mean(prof_res), np.std(prof_res))) ``` - result (without thrust) ``` Evaluate inference time cost... Mean inference time (std dev): 2058.89 ms (0.07 ms) ``` - result (with thrust) ``` Evaluate inference time cost... Mean inference time (std dev): 1.11 ms (0.03 ms) ``` @icemelon9 @vinx13 @masahi could you help to review?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
