yongfeng-nv commented on issue #4885: Split node min range is not stringent.
URL: https://github.com/apache/incubator-tvm/pull/4885#issuecomment-589508677
 
 
   Update this PR with the same change to the split node's nparts mode.  It 
causes two more test failures:
   
   - tests/python/unittest/test_lang_tensor_overload_op.py 
verify_conv2d_scalar_bop
   - tests/python/unittest/test_codegen_blob.py test_resnet18
   
   Further investigation shows that schedule_direct_cuda() in 
topi/python/topi/cuda/conv2d_direct.py binds IterVars from output stage, input 
padding stage, and weight loading stage to threadIdx.x.  The latter two stages 
attach to output stage.  With my nparts change, the latter two stage's IterVars 
have different ranges from the output stage's IterVar's.  TE generated wrong 
CUDA code causing numerical mismatch.
   
   I propose to change schedule_direct_cuda() by removing the threadIdx.x 
binding in the input padding and weight loading stages.  I measured the 
performance of the two affected tests.  After my change, the first one changes 
from 0.007306 ms to 0.005902 ms, the second one 30.47 ms to 30.59 ms.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to