adstraw commented on code in PR #13844:
URL: https://github.com/apache/tvm/pull/13844#discussion_r1091309289
##########
src/tir/transforms/lower_async_dma.cc:
##########
@@ -192,19 +198,33 @@ class AsyncDMALowerer : public StmtExprMutator {
// save queue ID for inspection in `wait` transform
queue_ids_.insert(queue_id);
- return Evaluate(Call(DataType::Int(32), builtin::dma_copy(),
- {queue_id,
- Call(DataType::Handle(), builtin::address_of(),
- {BufferLoad(bufferstorenode->buffer,
store_index)}),
- Call(DataType::Handle(), builtin::address_of(),
- {BufferLoad(bufferloadnode->buffer,
load_index)}),
- for_loop->extent * bufferloadnode->dtype.bytes(),
dma_bypass_cache_}));
+ auto call_dma_copy =
+ Evaluate(Call(DataType::Int(32), builtin::dma_copy(),
+ {queue_id,
+ Call(DataType::Handle(), builtin::address_of(),
+ {BufferLoad(bufferstorenode->buffer,
store_index)}),
+ Call(DataType::Handle(), builtin::address_of(),
+ {BufferLoad(bufferloadnode->buffer,
load_index)}),
+ for_loop->extent * bufferloadnode->dtype.bytes(),
dma_bypass_cache_}));
+
+ // if the buffer we are about to DMA was modified by the primfunc
+ // then we need to flush the buffer from the cache prior to the DMA
Review Comment:
@tmoreau89 and @janetsc I think this is really good feedback but I am a
little leery to make changes without a failing unit test to use for test driven
development as with `test_matmul.py` in this PR. I imagine that software cache
management to enable DMA bypass on Hexagon will be an iterative process. It
seems like you are pointing to the next iteration based on VTA example. I
would like to let this PR move through on its own merit and then address follow
on cases, if possible. Thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]