Hi,
I have tried to tune my conv2d workload ['NHWC', (32, 300, 300, 64)], but it failed for cuLaunchKernel's Grid_dim(2, 4, 90000(>65535)). ``` bz = s[output].fuse(hi, wi) s[output].bind(bz, block_z) ``` It seems like there should be a H/W direction tiling config to support all shapes. --- [Visit Topic](https://discuss.tvm.ai/t/rfc-tensor-core-optimization-of-cnns-on-tensor-core/6004/21) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/ed64d5c5d5d0e8c1d194338ab76ed01515ef07bbe7a14f1cf956c3d9a3748285).