csullivan edited a comment on issue #7730:
URL: https://github.com/apache/tvm/issues/7730#issuecomment-806052775


   Thanks @comaniac @masahi. Yes the problem is that different targets, and 
target specific topi implementations, can support different optimizations. In 
the case of using the blas libraries supported for a target, implicit broadcast 
is not supported. 
   
   One option that comes to mind is to add a shape legalization pass that adds 
the broadcast if a target has specific attributes (e.g. libs=cublas/rocblas 
etc). However this isn't sufficient; depending on the op strategy priorities or 
the applied tuning configs, it's possible that the blas library implementation 
won't be used. A better option could be to make use of #7518, and do the shape 
legalization after the primitive functions have been lowered to TIR and can be 
inspected.
   
   We could also disable implicit broadcast, but that can increase the memory 
use (from folding the constant broadcasts) which we've seen overflow device 
memory for larger batch sizes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to