elvin-n opened a new pull request, #13100: URL: https://github.com/apache/tvm/pull/13100
Origin cuda schedule uses rfactor that is 10x-50x slower on Adreno than without barries for example mean on QHD picture on Snapdragon 888 with cuda schedule is executed for 69ms while with new proposed schedule is executed for 6.2 the same for argmin: 183ms -> 3.9ms -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
