mbrookhart opened a new pull request #7099: URL: https://github.com/apache/tvm/pull/7099
@Laurawly @zhiics @icemelon9 @csullivan @tkonolige There have been many complaints recently about stability and performance of the tir-based cuda sort kernel. I've spent a couple of days this week getting a cuda version of Parallel Mergesort. It's a stable sort, so it fixes the flakiness we've seen with argsort and argwhere, it changes the threading to support dynamic shapes, and it increases the performance significantly over the previous kernel. This PR only addresses the core sort_ir function, extending this to other versions sort in this file is future work. I tested performance on a variety of shapes using this [script](https://gist.github.com/mbrookhart/c4730cbec48eaa4afcbf86d875847f9f) and obtained these numbers on my 1070TI. It's not as fast as Thrust, as expected, but it's much closer for all shapes tested here, and even manages to beat thrust on a few. Thanks! | Shape | main | thrust | this | |---------------|---------|--------|-------| | (2000, 2, 2) | 7.77 | 0.58 | 1.67 | | (2, 2000, 2) | 4.8 | 0.7 | 1.59 | | (2, 2, 2000) | 3.24 | 0.63 | 1.54 | | (4000, 2, 2) | 25.53 | 0.65 | 4.05 | | (2, 4000, 2) | 13.78 | 0.62 | 3.3 | | (2, 2, 4000) | 9.85 | 0.63 | 4.04 | | (2, 12000, 2) | 369.99 | 0.68 | 13.87 | | (2, 2, 12000) | 86.55 | 0.66 | 11.11 | | (12000, 2, 2) | 486.65 | 0.66 | 13.69 | | (2000, 8, 8) | 259.21 | 10.4 | 4.22 | | (8, 2000, 8) | 111.14 | 8.45 | 3.43 | | (8, 8, 2000) | 50.37 | 9.05 | 3.05 | | (4000, 8, 8) | 671.53 | 8.24 | 9.58 | | (8, 4000, 8) | 368.59 | 8.47 | 10.12 | | (8, 8, 4000) | 171.18 | 8.74 | 6.27 | | (12000, 8, 8) | 3571.97 | 15.22 | 42.99 | | (8, 12000, 8) | 3517.72 | 15.07 | 45.84 | | (8, 8, 12000) | 1417.97 | 15.03 | 27.57 | ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
