adstraw opened a new pull request, #13381:
URL: https://github.com/apache/tvm/pull/13381

   Enables Hexagon User DMA bypass mode based on user-specified 
`dma_bypass_cache` option for DMA copies between DDR and VTCM.  
   
   The **upside** of this change is increased DMA bandwidth (up to 40 GBps 
observed using `test_vtcm_bandwidth.py`) and compute throughput using a 3-stage 
pipeline --- cache read, compute, cache write (up to 38 Gops using 
`test_parallel_hvx_load_vtcm.py`).
   
   The **downside** of this change is the potential for data coherency issues 
resulting from the need to manage the cache in software when using DMA bypass 
hence the user `dma_bypass_cache` option to enable or disable bypass mode.
   
   The strategy to manage the cache in software centers around the requirement 
for Hexagon to operate on `HexagonBuffer` objects regardless of scope --- DDR 
or VTCM.  When copying to / from a `HexagonBuffer` we aggressively invalidate 
the cache for both the source and destination, both before and after the copy.  
Also note that the copy is now implemented with `memcpy` instead of DMA.  With 
the cache clean after copy to / from a `HexagonBuffer` we can now use DMA 
bypass mode.  However, **this software cache management strategy is NOT 
infallible** --- if a `HexagonBuffer` becomes dirty in the cache prior to a DMA 
with bypass mode enabled we may see data coherency issues.
   
   Also simplifies Hexagon DMA flows by removing the unused `mem_copy` 
instrinsic and lowering as well as the `hexagon_user_dma_1d_sync` helper 
function which is replaced by calls to `HexagonUserDMA::Copy` and 
`HexagonUserDMA::Wait`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to