masahi edited a comment on pull request #7056:
URL: https://github.com/apache/tvm/pull/7056#issuecomment-740919507


   > As a larger conversation, I don't love how much we're depending on thrust 
for functionality, I'd kind of like to fix the issues in topi around sort so we 
don't have to lean on thrust so much. We're depending on the nomially cuda topi 
kernels for a lot of other GPUs, so this trend makes it harder to support more 
diverse hardware.
   
   @mbrookhart @zhiics While I fully agree with this generally, for 
fundamental, low level GPU primitives such as sorting, scan etc, I think it 
would be really hard for generic implementations to match or outperform 
platform specific libraries. These libraries have years of development behind 
it and use platform specific intrinsics to maximize performance. This also 
applies to cuDNN, but unlike convolution op, I don't think AutoTVM or Ansor 
would help generate efficient sort or scan ops.
   
   Sooner or later, I think we will introduce `cumsum` op to TVM. On CUDA, 
`cumsum` can be implemented very efficiently via thrust or cub's 
`inclusive_scan`. But without it, we have to roll our own GPU scan 
implementation that can compete with vender-provided one, which I think would 
be a formidable or near impossible task.
   
   So my opinion is, while native TVM solution is always what we should strive 
for, if there is a platform specific library, we should embrace it. Sort, scan 
etc are standard enough that there is a good chance platform specific library 
is available. For example, rocm has their implementation of thrust, on OpenCL 
there is Boost.compute.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to