supersat commented on issue #13976: URL: https://github.com/apache/tvm/issues/13976#issuecomment-1460744645
I am working on a minimal test case to demonstrate the issue. Essentially, we were writing TIR to offload some compute to an NPU. In this primfunc, we allocate some NPU RAM, copy partial tensors in to it, invoke the compute, and copy partial results out of the allocated NPU RAM. We had a rolling buffer similar to what's generated with the `rolling_buffer` schedule primitive, but hand-written. We invoke the NPU through a `call_extern` and pass in the rolling buffer offsets computed by `T.address_of`. The actual memory access happens inside the external function, but since TVM can't see this, it ended up reusing memory where the output buffer was reusing part of the input rolling buffer, generating nonsense results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
