Hi,

I have tried to tune my conv2d workload ['NHWC', (32, 300, 300, 64)], but it 
failed for cuLaunchKernel's Grid_dim(2, 4, 90000(>65535)).

```
bz = s[output].fuse(hi, wi)
s[output].bind(bz, block_z)
```
It seems like there should be a H/W direction tiling config to support all 
shapes.





---
[Visit 
Topic](https://discuss.tvm.ai/t/rfc-tensor-core-optimization-of-cnns-on-tensor-core/6004/21)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/ed64d5c5d5d0e8c1d194338ab76ed01515ef07bbe7a14f1cf956c3d9a3748285).

Reply via email to