jwfromm opened a new pull request #6616: URL: https://github.com/apache/incubator-tvm/pull/6616
We found that requiring explicit broadcasting along the batch dimension for `batch_matmul` could cause serious memory issues during constant folding, since it would effectively multiply the size of weights by the input batch size. This PR allows implicit broadcasting along the batch dimension for batch_matmul without increasing compute or memory requirements. This should in fact give pretty significant speedups in cases where we previously applied explicit broadcasting. I also noticed that we had an unused C++ definition of `batch_matmul` and removed it to prevent confusion. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
