abhikran-quic edited a comment on pull request #9186:
URL: https://github.com/apache/tvm/pull/9186#issuecomment-971543884


   > @abhikran-quic friendly ping to see if updating this PR is currently 
blocked, thanks!
   
   Hi @tmoreau89 , Thanks! I have a working version ready for the op as 
suggested by @AndrewZhaoLuo . I should have it ready for review in couple of 
days.
   
   Though, I'd still need some help in figuring out the error seen on GPU tests 
for the op.
   
   I've verified the tests on for X86 and ARM architectures but since I don't 
have a GPU, I am unable to run CUDA tests.
   
   My observations regarding the error are mentioned below:
   
   `Check failed: ret == 0 (-1 vs. 0) : Assert fail: 
(((tir.tvm_struct_get(arg2, 0, 5) == (uint8)0) && (tir.tvm_struct_get(arg2, 0, 
6) == (uint8)16)) && (tir.tvm_struct_get(arg2, 0, 7) == (uint16)1)), arg2.dtype 
is expected to be int16
   `
   1. Here arg2 corresponds to output array. The output dtype is expected to be 
int16 whereas I'm explicitly specifying the output dtype as uint32 to 
batch_matmul: 
https://github.com/apache/tvm/pull/9186/files#diff-023ca730d77056c7a76a1c9b7f4f3f41ec985f8ec58cb346129925c7078dce7aR907
   2. The relay strategy of batch_matmul op for X86 and CUDA are very 
different. Could this be a reason why output dtype is being altered by CUDA 
tests ?
   3. Is there a way to run this test on x86 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to