adstraw commented on code in PR #13844:
URL: https://github.com/apache/tvm/pull/13844#discussion_r1091231567


##########
src/runtime/hexagon/hexagon_buffer.cc:
##########
@@ -235,19 +245,20 @@ void hexagon_buffer_copy_across_regions(const BufferSet& 
dest, const BufferSet&
 
   // Finally, do the memory copies.
   for (const auto& copy : macro_copies) {
-    // clean Hexagon cache before / after memcpy to ensure clean cache state 
to enable usage of DMA
-    // bypass mode for increased DMA bandwidth
     // TODO(HWE): Switch to ION Buffer to avoid need for memcpy and 
potentially lighten or alleviate
     // the burden of cache invalidation in this code
-    qurt_mem_cache_clean(reinterpret_cast<qurt_addr_t>(copy.dest), 
copy.num_bytes,
-                         QURT_MEM_CACHE_INVALIDATE, QURT_MEM_DCACHE);
-    qurt_mem_cache_clean(reinterpret_cast<qurt_addr_t>(copy.src), 
copy.num_bytes,
-                         QURT_MEM_CACHE_INVALIDATE, QURT_MEM_DCACHE);
     memcpy(copy.dest, copy.src, copy.num_bytes);
+
+    // We must flush and invalidate both the destination and source of the 
memcpy from the cache to
+    // ensure that DMA with cache bypass enabled will function properly. DMA 
with cache bypass
+    // enabled assumes that HexagonBuffer objects are not cached unless 
explicitly modified by the
+    // primfunc. We must flush and invalidate after the memcpy to uphold this 
assumption. Note that
+    // we need not flush and invalidate prior to the memcpy because it is 
coherent - issued through
+    // the cache.
     qurt_mem_cache_clean(reinterpret_cast<qurt_addr_t>(copy.dest), 
copy.num_bytes,
-                         QURT_MEM_CACHE_INVALIDATE, QURT_MEM_DCACHE);
+                         QURT_MEM_CACHE_FLUSH_INVALIDATE, QURT_MEM_DCACHE);
     qurt_mem_cache_clean(reinterpret_cast<qurt_addr_t>(copy.src), 
copy.num_bytes,
-                         QURT_MEM_CACHE_INVALIDATE, QURT_MEM_DCACHE);
+                         QURT_MEM_CACHE_FLUSH_INVALIDATE, QURT_MEM_DCACHE);

Review Comment:
   Prefer to keep "flush and invalidate" on the destination which is safer (if 
less performant) that a flush (only).  The idea behind this PR, which is 
captured in the comments, is that HexagonBuffer objects should not be in the 
cache unless modified by the primfunc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to