[GitHub] [tvm] AndrewZhaoLuo opened a new pull request, #13599: [Contrib] Faster Top-K Impl.

GitBox Mon, 12 Dec 2022 13:23:26 -0800


AndrewZhaoLuo opened a new pull request, #13599:
URL: https://github.com/apache/tvm/pull/13599


   This is a simple rewrite of hand-coded top-k function used for CPU targets. 
   
   The old implementation sorted each axis and then took the biggest k elements.
   
   The new implementation does a single pass of each axis, keeping a min heap 
to store the top-k elements up to that point. 
   
   If n is the size of the array, and we want to find top k, the old 
implementation has runtime in O(nlogn) with additional memory O(n) to store the 
sorted array. The new implementation is O(n log k), and in practice is probably 
amortized to O(n / k * log k) in many scenarios and only requires O(k). Note n 
>> k.
   
   In practice this new kernel led to a 20x speedup over existing one. On a 
Xeon Platinum 8370C CPU @ 2.80GHz for input shape [1, 3050] with k = 15, the 
latency went from 300us --> ~10us. There is probably more room for shaving off 
a little more time on the scale of a single us's, however I have determined it 
to not be worth it. 
   
   This change however is probably in the range of worth comitting.
   
   TODO: 
   
   Wider benchmark suite over many shapes + k + hardware platforms.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] AndrewZhaoLuo opened a new pull request, #13599: [Contrib] Faster Top-K Impl.

Reply via email to