supersat commented on issue #13976:
URL: https://github.com/apache/tvm/issues/13976#issuecomment-1460744645

   I am working on a minimal test case to demonstrate the issue.
   
   Essentially, we were writing TIR to offload some compute to an NPU. In this 
primfunc, we allocate some NPU RAM, copy partial tensors in to it, invoke the 
compute, and copy partial results out of the allocated NPU RAM. We had a 
rolling buffer similar to what's generated with the `rolling_buffer` schedule 
primitive, but hand-written. We invoke the NPU through a `call_extern` and pass 
in the rolling buffer offsets computed by `T.address_of`. The actual memory 
access happens inside the external function, but since TVM can't see this, it 
ended up reusing memory where the output buffer was reusing part of the input 
rolling buffer, generating nonsense results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to