nverke opened a new pull request, #13005:
URL: https://github.com/apache/tvm/pull/13005

   …ing on hexagon.
   
   The purpose of this test is to show how to create a pipeline that utilizes 
async dma copies to vtcm for performance speedup. It compares performance 
results between several different schedules and should serve as a good starting 
point for others wishing to take advantage of the new features. 
   
   
   Approximated Activation Shape | Approximated Weight Shape | Approximated 
complexity (GOPS) | Total Memory Transferred (MB) | N, N, N | S, S, S | B, B, B 
| A, B, B | B, B, A | A, B, A | Async DMA Speedup
   -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
   (1, 32, 32, 128) | (1, 1, 1, 128) | 0.001 | 0.26 | 0.0547 | 0.0715 | 0.6492 
| 0.6172 | 0.1398 | 0.1033 | 0.53
   (1, 32, 32, 128) | (1, 3, 3, 128) | 0.005 | 0.26 | 0.0678 | 0.085 | 0.6636 | 
0.6328 | 0.1526 | 0.1201 | 0.56
   (1, 32, 32, 128) | (1, 7, 7, 128) | 0.026 | 0.27 | 0.1264 | 0.1463 | 0.7283 
| 0.6952 | 0.2178 | 0.1843 | 0.69
   (1, 32, 32, 128) | (1, 9, 9, 128) | 0.042 | 0.27 | 0.1689 | 0.1954 | 0.7742 
| 0.7426 | 0.2626 | 0.2317 | 0.73
   (1, 64, 64, 128) | (1, 1, 1, 128) | 0.002 | 1.05 | 0.2237 | 0.2812 | 2.6897 
| 2.3175 | 0.4083 | 0.2715 | 0.82
   (1, 64, 64, 128) | (1, 3, 3, 128) | 0.019 | 1.05 | 0.375 | 0.3362 | 2.7396 | 
2.3716 | 0.4621 | 0.3262 | 1.15
   (1, 64, 64, 128) | (1, 7, 7, 128) | 0.103 | 1.05 | 0.6882 | 0.6202 | 3.008 | 
2.6313 | 0.7207 | 0.5853 | 1.18
   (1, 64, 64, 128) | (1, 9, 9, 128) | 0.17 | 1.06 | 0.879 | 0.8318 | 3.1913 | 
2.8071 | 0.8981 | 0.7623 | 1.15
   (1, 128, 128, 128) | (1, 1, 1, 128) | 0.008 | 4.19 | 1.0594 | 1.4324 | 
11.6076 | 9.2136 | 2.7771 | 0.9508 | 1.11
   (1, 128, 128, 128) | (1, 3, 3, 128) | 0.075 | 4.2 | 2.2994 | 2.4282 | 
11.9346 | 9.4362 | 3.0186 | 1.1717 | 1.96
   (1, 128, 128, 128) | (1, 7, 7, 128) | 0.411 | 4.2 | 5.0044 | 5.1789 | 
12.9821 | 10.4726 | 3.8554 | 2.2011 | 2.27
   (1, 128, 128, 128) | (1, 9, 9, 128) | 0.679 | 4.2 | 7.6593 | 7.0915 | 
13.7597 | 11.1763 | 4.7059 | 2.9003 | 2.64
   
   Each column specifies the data copy method used for the Activation, Weight, 
and Output vectors respectively with the following options. 
   N = No copying to VTCM
   S = Synchronous DMA copies to VTCM 
   B = Basic/Naive copies to VTCM
   A = Asynchronous DMA copies to VTCM 
   
   For example B, B, A uses Naive copies for the Activation and Weight input 
vectors and uses Async DMA copies for the output vector (VTCM -> DDR)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to