nverke opened a new pull request, #13005: URL: https://github.com/apache/tvm/pull/13005
…ing on hexagon. The purpose of this test is to show how to create a pipeline that utilizes async dma copies to vtcm for performance speedup. It compares performance results between several different schedules and should serve as a good starting point for others wishing to take advantage of the new features. Approximated Activation Shape | Approximated Weight Shape | Approximated complexity (GOPS) | Total Memory Transferred (MB) | N, N, N | S, S, S | B, B, B | A, B, B | B, B, A | A, B, A | Async DMA Speedup -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- (1, 32, 32, 128) | (1, 1, 1, 128) | 0.001 | 0.26 | 0.0547 | 0.0715 | 0.6492 | 0.6172 | 0.1398 | 0.1033 | 0.53 (1, 32, 32, 128) | (1, 3, 3, 128) | 0.005 | 0.26 | 0.0678 | 0.085 | 0.6636 | 0.6328 | 0.1526 | 0.1201 | 0.56 (1, 32, 32, 128) | (1, 7, 7, 128) | 0.026 | 0.27 | 0.1264 | 0.1463 | 0.7283 | 0.6952 | 0.2178 | 0.1843 | 0.69 (1, 32, 32, 128) | (1, 9, 9, 128) | 0.042 | 0.27 | 0.1689 | 0.1954 | 0.7742 | 0.7426 | 0.2626 | 0.2317 | 0.73 (1, 64, 64, 128) | (1, 1, 1, 128) | 0.002 | 1.05 | 0.2237 | 0.2812 | 2.6897 | 2.3175 | 0.4083 | 0.2715 | 0.82 (1, 64, 64, 128) | (1, 3, 3, 128) | 0.019 | 1.05 | 0.375 | 0.3362 | 2.7396 | 2.3716 | 0.4621 | 0.3262 | 1.15 (1, 64, 64, 128) | (1, 7, 7, 128) | 0.103 | 1.05 | 0.6882 | 0.6202 | 3.008 | 2.6313 | 0.7207 | 0.5853 | 1.18 (1, 64, 64, 128) | (1, 9, 9, 128) | 0.17 | 1.06 | 0.879 | 0.8318 | 3.1913 | 2.8071 | 0.8981 | 0.7623 | 1.15 (1, 128, 128, 128) | (1, 1, 1, 128) | 0.008 | 4.19 | 1.0594 | 1.4324 | 11.6076 | 9.2136 | 2.7771 | 0.9508 | 1.11 (1, 128, 128, 128) | (1, 3, 3, 128) | 0.075 | 4.2 | 2.2994 | 2.4282 | 11.9346 | 9.4362 | 3.0186 | 1.1717 | 1.96 (1, 128, 128, 128) | (1, 7, 7, 128) | 0.411 | 4.2 | 5.0044 | 5.1789 | 12.9821 | 10.4726 | 3.8554 | 2.2011 | 2.27 (1, 128, 128, 128) | (1, 9, 9, 128) | 0.679 | 4.2 | 7.6593 | 7.0915 | 13.7597 | 11.1763 | 4.7059 | 2.9003 | 2.64 Each column specifies the data copy method used for the Activation, Weight, and Output vectors respectively with the following options. N = No copying to VTCM S = Synchronous DMA copies to VTCM B = Basic/Naive copies to VTCM A = Asynchronous DMA copies to VTCM For example B, B, A uses Naive copies for the Activation and Weight input vectors and uses Async DMA copies for the output vector (VTCM -> DDR) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
