yongfeng-nv commented on issue #4885: Split node min range is not stringent. URL: https://github.com/apache/incubator-tvm/pull/4885#issuecomment-589508677 Update this PR with the same change to the split node's nparts mode. It causes two more test failures: - tests/python/unittest/test_lang_tensor_overload_op.py verify_conv2d_scalar_bop - tests/python/unittest/test_codegen_blob.py test_resnet18 Further investigation shows that schedule_direct_cuda() in topi/python/topi/cuda/conv2d_direct.py binds IterVars from output stage, input padding stage, and weight loading stage to threadIdx.x. The latter two stages attach to output stage. With my nparts change, the latter two stage's IterVars have different ranges from the output stage's IterVar's. TE generated wrong CUDA code causing numerical mismatch. I propose to change schedule_direct_cuda() by removing the threadIdx.x binding in the input padding and weight loading stages. I measured the performance of the two affected tests. After my change, the first one changes from 0.007306 ms to 0.005902 ms, the second one 30.47 ms to 30.59 ms.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
