jwfromm opened a new pull request #6616:
URL: https://github.com/apache/incubator-tvm/pull/6616


   We found that requiring explicit broadcasting along the batch dimension for 
`batch_matmul` could cause serious memory issues during constant folding, since 
it would effectively multiply the size of weights by the input batch size. This 
PR allows implicit broadcasting along the batch dimension for batch_matmul 
without increasing compute or memory requirements. This should in fact give 
pretty significant speedups in cases where we previously applied explicit 
broadcasting. I also noticed that we had an unused C++ definition of 
`batch_matmul` and removed it to prevent confusion.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to