adstraw opened a new pull request, #13147: URL: https://github.com/apache/tvm/pull/13147
[Hexagon] Enable DMA bypass with cache invalidate. Performance data: ``` Test with A.size: 131072, W.size: 128, computational complexity of 0.001 GOPs, and total memory transfer of 0.26 MB... -without_vtcm took 0.07 ms -synchronous_dma took 0.3743 ms -base_vtcm took 0.7842 ms -async_dma_input took 0.7285 ms -async_dma_output took 0.2298 ms -async_dma_input_output took 0.2737 ms PASSED tests/python/contrib/test_hexagon/test_async_dma_pipeline.py::TestAsyncDMAPipeline::test_loading_vtcm_for_vrmpy[1024-9] Test with A.size: 131072, W.size: 1152, computational complexity of 0.005 GOPs, and total memory transfer of 0.26 MB... -without_vtcm took 0.0878 ms -synchronous_dma took 0.5933 ms -base_vtcm took 0.7718 ms -async_dma_input took 0.7208 ms -async_dma_output took 0.2445 ms -async_dma_input_output took 0.3216 ms PASSED tests/python/contrib/test_hexagon/test_async_dma_pipeline.py::TestAsyncDMAPipeline::test_loading_vtcm_for_vrmpy[1024-49] Test with A.size: 131072, W.size: 6272, computational complexity of 0.026 GOPs, and total memory transfer of 0.27 MB... -without_vtcm took 0.1674 ms -synchronous_dma took 0.686 ms -base_vtcm took 0.868 ms -async_dma_input took 0.8386 ms -async_dma_output took 0.3148 ms -async_dma_input_output took 0.3619 ms PASSED tests/python/contrib/test_hexagon/test_async_dma_pipeline.py::TestAsyncDMAPipeline::test_loading_vtcm_for_vrmpy[1024-81] Test with A.size: 131072, W.size: 10368, computational complexity of 0.042 GOPs, and total memory transfer of 0.27 MB... -without_vtcm took 0.2253 ms -synchronous_dma took 0.7096 ms -base_vtcm took 0.8924 ms -async_dma_input took 0.841 ms -async_dma_output took 0.3642 ms -async_dma_input_output took 0.4097 ms PASSED tests/python/contrib/test_hexagon/test_async_dma_pipeline.py::TestAsyncDMAPipeline::test_loading_vtcm_for_vrmpy[4096-1] Test with A.size: 524288, W.size: 128, computational complexity of 0.002 GOPs, and total memory transfer of 1.05 MB... -without_vtcm took 0.3865 ms -synchronous_dma took 0.9651 ms -base_vtcm took 3.0979 ms -async_dma_input took 2.3463 ms -async_dma_output took 0.5277 ms -async_dma_input_output took 0.4779 ms PASSED tests/python/contrib/test_hexagon/test_async_dma_pipeline.py::TestAsyncDMAPipeline::test_loading_vtcm_for_vrmpy[4096-9] Test with A.size: 524288, W.size: 1152, computational complexity of 0.019 GOPs, and total memory transfer of 1.05 MB... -without_vtcm took 0.6494 ms -synchronous_dma took 1.6926 ms -base_vtcm took 3.0366 ms -async_dma_input took 2.416 ms -async_dma_output took 0.579 ms -async_dma_input_output took 0.5463 ms PASSED tests/python/contrib/test_hexagon/test_async_dma_pipeline.py::TestAsyncDMAPipeline::test_loading_vtcm_for_vrmpy[4096-49] Test with A.size: 524288, W.size: 6272, computational complexity of 0.103 GOPs, and total memory transfer of 1.05 MB... -without_vtcm took 1.1504 ms -synchronous_dma took 2.1075 ms -base_vtcm took 3.1384 ms -async_dma_input took 2.6936 ms -async_dma_output took 0.8734 ms -async_dma_input_output took 0.8069 ms PASSED tests/python/contrib/test_hexagon/test_async_dma_pipeline.py::TestAsyncDMAPipeline::test_loading_vtcm_for_vrmpy[4096-81] Test with A.size: 524288, W.size: 10368, computational complexity of 0.17 GOPs, and total memory transfer of 1.06 MB... -without_vtcm took 1.4473 ms -synchronous_dma took 2.3505 ms -base_vtcm took 3.358 ms -async_dma_input took 2.8799 ms -async_dma_output took 1.0477 ms -async_dma_input_output took 1.0087 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
