adstraw opened a new pull request, #13381: URL: https://github.com/apache/tvm/pull/13381
Enables Hexagon User DMA bypass mode based on user-specified `dma_bypass_cache` option for DMA copies between DDR and VTCM. The **upside** of this change is increased DMA bandwidth (up to 40 GBps observed using `test_vtcm_bandwidth.py`) and compute throughput using a 3-stage pipeline --- cache read, compute, cache write (up to 38 Gops using `test_parallel_hvx_load_vtcm.py`). The **downside** of this change is the potential for data coherency issues resulting from the need to manage the cache in software when using DMA bypass hence the user `dma_bypass_cache` option to enable or disable bypass mode. The strategy to manage the cache in software centers around the requirement for Hexagon to operate on `HexagonBuffer` objects regardless of scope --- DDR or VTCM. When copying to / from a `HexagonBuffer` we aggressively invalidate the cache for both the source and destination, both before and after the copy. Also note that the copy is now implemented with `memcpy` instead of DMA. With the cache clean after copy to / from a `HexagonBuffer` we can now use DMA bypass mode. However, **this software cache management strategy is NOT infallible** --- if a `HexagonBuffer` becomes dirty in the cache prior to a DMA with bypass mode enabled we may see data coherency issues. Also simplifies Hexagon DMA flows by removing the unused `mem_copy` instrinsic and lowering as well as the `hexagon_user_dma_1d_sync` helper function which is replaced by calls to `HexagonUserDMA::Copy` and `HexagonUserDMA::Wait`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
