JCBrouwer commented on issue #10223:
URL: https://github.com/apache/tvm/issues/10223#issuecomment-1052408332


   > How TVM + cuDNN compares to PT? Since you are running on fp16, I'd hope 
that we can use tensorcore. But I've never seen grouped convolution running on 
tensorcore. Also cutlass is generally faster than cuDNN but it doesn't support 
grouped or depth wise afaik.
   
   I'm not quite sure what the best way to benchmark/profile things is. I've 
been trying to use `ncu`, but it's __very__ slow for the PyTorch models.
   
   At the moment the PyTorch models (vanilla, traced, optimize_for_inference) 
reach about 9-11 fps and the TVM + CUDNN is about 15-19 fps. I'm hoping to get 
into the 25-30 fps range.
   
   I'm trying to get more of the computation to be done in fp16, but I've ran 
into this issue #10397 .
   
   > You may try our auto-scheduler to see if it can beat cuDNN.
   
   So far the autotvm tuner hasn't been successful for me. It took a couple 
days to tune all of the ops with the default settings from the tutorial and it 
actually ended up slower than the untuned TVM + CUDNN version.
   
   I haven't looked too deep into the auto-scheduler yet because I couldn't 
find a good tutorial of applying it to a large model (I think the only tutorial 
is for single ops?).
   
   I figured it would also be less effective due to using CUDNN, which might 
reduce the flexibility of the scheduler, although I'm not sure if that's 
actually the case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to