mbrookhart opened a new pull request #7099:
URL: https://github.com/apache/tvm/pull/7099


   @Laurawly @zhiics @icemelon9 @csullivan @tkonolige 
   
   There have been many complaints recently about stability and performance of 
the tir-based cuda sort kernel. I've spent a couple of days this week getting a 
cuda version of Parallel Mergesort. It's a stable sort, so it fixes the 
flakiness we've seen with argsort and argwhere, it changes the threading to 
support dynamic shapes, and it increases the performance significantly over the 
previous kernel.
   
   This PR only addresses the core sort_ir function, extending this to other 
versions sort in this file is future work. 
   
   I tested performance on a variety of shapes using this 
[script](https://gist.github.com/mbrookhart/c4730cbec48eaa4afcbf86d875847f9f) 
and obtained these numbers on my 1070TI. It's not as fast as Thrust, as 
expected, but it's much closer for all shapes tested here, and even manages to 
beat thrust on a few. 
   
   Thanks!
   
   | Shape         | main    | thrust | this  |
   |---------------|---------|--------|-------|
   | (2000, 2, 2)  | 7.77    | 0.58   | 1.67  |
   | (2, 2000, 2)  | 4.8     | 0.7    | 1.59  |
   | (2, 2, 2000)  | 3.24    | 0.63   | 1.54  |
   | (4000, 2, 2)  | 25.53   | 0.65   | 4.05  |
   | (2, 4000, 2)  | 13.78   | 0.62   | 3.3   |
   | (2, 2, 4000)  | 9.85    | 0.63   | 4.04  |
   | (2, 12000, 2) | 369.99  | 0.68   | 13.87 |
   | (2, 2, 12000) | 86.55   | 0.66   | 11.11 |
   | (12000, 2, 2) | 486.65  | 0.66   | 13.69 |
   | (2000, 8, 8)  | 259.21  | 10.4   | 4.22  |
   | (8, 2000, 8)  | 111.14  | 8.45   | 3.43  |
   | (8, 8, 2000)  | 50.37   | 9.05   | 3.05  |
   | (4000, 8, 8)  | 671.53  | 8.24   | 9.58  |
   | (8, 4000, 8)  | 368.59  | 8.47   | 10.12 |
   | (8, 8, 4000)  | 171.18  | 8.74   | 6.27  |
   | (12000, 8, 8) | 3571.97 | 15.22  | 42.99 |
   | (8, 12000, 8) | 3517.72 | 15.07  | 45.84 |
   | (8, 8, 12000) | 1417.97 | 15.03  | 27.57 |


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to